Main content

Date created: | Last Updated:

: DOI | ARK

Creating DOI. Please wait...

Create DOI

Category: Project

Description: MultiSub is a multi-parallel corpus of movie subtitles. The data in all languages is automatically lemmatized, POStagged and syntactically parsed.

License: CC-By Attribution 4.0 International

Wiki

This document contains practical information for using the data.

1. NAMING CONVENTIONS

  • document corresponding to a movie's subtitle: name of the movie in camel case, hyphen, add the iso code of the language: NameOfMovie-de (Avatar-en)

  • documents corresponding to serie's subtitles: name of the show in camel case (consisting of name, season number and episode number), hypen, add the iso code of t…

Files

Files can now be accessed and managed under the Files tab.

Citation

Components

processed data


Recent Activity

Loading logs...

Tags

conll-udependencylemmaparallel corporaPOSsusbtitlesUDxml

Recent Activity

Unable to retrieve logs at this time. Please refresh the page or contact support@osf.io if the problem persists.

OSF does not support the use of Internet Explorer. For optimal performance, please switch to another browser.
Accept
This website relies on cookies to help provide a better user experience. By clicking Accept or continuing to use the site, you agree. For more information, see our Privacy Policy and information on cookie use.
Accept
×

Start managing your projects on the OSF today.

Free and easy to use, the Open Science Framework supports the entire research lifecycle: planning, execution, reporting, archiving, and discovery.