Home

Menu

Loading wiki pages...

View
Wiki Version:
<p>This dataset contains 1,093 movie scripts collected from the website <a href="http://imsdb.com" rel="nofollow">imsdb.com</a>, each in a separate text file. The file imsdb_sample.txt contains the titles of all movies (corresponding file names are in the form Script_TITLE.txt) The website was crawled in January 2017. Some scripts are not present as they were missing in <a href="http://imsdb.com" rel="nofollow">imsdb.com</a> or because they were uploaded as pdf files. Please notice that (i) the original scripts were uploaded on the website by individual users, so that they might not correspond exactly to the movie scripts and typos may be present; (ii) html formatting was not consistent in the website, and so neither is the formatting of the resulting text files. Even considering (i) and (ii), the quality seems good on average and the dataset can be easily used for text-mining tasks.</p>
OSF does not support the use of Internet Explorer. For optimal performance, please switch to another browser.
Accept
This website relies on cookies to help provide a better user experience. By clicking Accept or continuing to use the site, you agree. For more information, see our Privacy Policy and information on cookie use.
Accept
×

Start managing your projects on the OSF today.

Free and easy to use, the Open Science Framework supports the entire research lifecycle: planning, execution, reporting, archiving, and discovery.