Dialogue from these movies and TV shows has been used by companies such as Apple and Anthropic to train AI systems. The files within this data set are not scripts, exactly. Rather, they are subtitles taken from a website called OpenSubtitles.org. Users of the site typically extract subtitles from DVDs, Blu-ray discs, and internet streams using optical-character-recognition (OCR) software.
Source: There’s No Longer Any Doubt That Hollywood Writing Is Powering AI