<p>This document contains practical information for using the data.</p> <p><strong>1. NAMING CONVENTIONS</strong> - <strong>document corresponding to a movie's subtitle</strong>: name of the movie in camel case, hyphen, add the iso code of the language: NameOfMovie-de (Avatar-en)</p> <ul> <li> <p><strong>documents corresponding to serie's subtitles:</strong> name of the show in camel case (consisting of name, season number and episode number), hypen, add the iso code of the language: ShowS05E10-de: breakingBadS01E01-fr</p> </li> <li> <p><strong>name of linking file:</strong> name of the serie/movie, hyphen, iso code of source language, underscore, iso code of target language: breakingBadS01E01-en_nl</p> </li> <li> <p><strong>Portuguese and Spanish subtitles:</strong> added LA for Latin American to specify whether the files are European or Latin American subtitles: </p> <ul> <li>howIMetYourMotherS01E01-spLA.srt (from Latin America)</li> <li> <p>howIMetYourMotherS01E01-sp.srt (from Spain)</p> </li> <li> <p>HowIMetYourMotherS01E01-ptLA.srt (from Brazil)</p> </li> <li>HowIMetYourMotherS01E01-pt.srt (from Portugal)</li> </ul> </li> <li> <p><strong>Relation names (relations linking sentences in source and target languages)</strong>:</p> <ul> <li> <p>1-to-1 relation: </p> <ul> <li>sentence 5 in English = sentence 5 in French </li> </ul> </li> <li> <p>many-to-1: </p> <ul> <li>sentence 5 in English = sentence 5 in French</li> <li>sentence 6 in English = sentence 5 in French</li> </ul> </li> <li> <p>1-to-many </p> <ul> <li>sentence 5 in English = sentence 5 and sentence 6 in French</li> </ul> </li> </ul> </li> </ul> <p><strong>RULES:</strong></p> <p>1- never change the English sentence IDs!</p> <ul> <li> <p>if 1-to-many: we merge the sentences in the target language, output = 1 line (target merge)</p> </li> <li> <p>if many-to-1: we repeat the line in the target language, output = n of lines in source language</p> </li> <li> <p>(target repeat)</p> </li> <li> <p>if no match in target language: we do not have to do anything</p> </li> <li>if no match in source language: we append the last sentence</li> </ul> <p><strong>SCENE ANNOTATION:</strong></p> <ul> <li>defined based on a single, coherent time unit (+ same location)</li> <li>add speaker and scene ID in merged tab delimited format</li> </ul>
