Main content
MultiSub
Date created: | Last Updated:
: DOI | ARK
Creating DOI. Please wait...
Category: Project
Description: MultiSub is a multi-parallel corpus of movie subtitles. The data in all languages is automatically lemmatized, POStagged and syntactically parsed.
This document contains practical information for using the data.
1. NAMING CONVENTIONS
-
document corresponding to a movie's subtitle: name of the movie in camel case, hyphen, add the iso code of the language: NameOfMovie-de (Avatar-en)
-
documents corresponding to serie's subtitles: name of the show in camel case (consisting of name, season number and episode number), hypen, add the iso code of t…
Files
Files can now be accessed and managed under the Files tab.
Citation
Components
Recent Activity
Unable to retrieve logs at this time. Please refresh the page or contact support@osf.io if the problem persists.