studyforrest-paper-speechannotation

Date created: | Last Updated:

: DOI | ARK

Creating DOI. Please wait...

Create DOI

Category: Project

Description: Here we present an annotation of speech in the audio-visual movie "Forrest Gump" and its audio-description for a visually impaired audience, as an addition to a large public functional brain imaging dataset (studyforrest.org). The annotation provides information about the exact timing of each of the more than 2500 spoken sentences, 16000 words (including 202 non-speech vocalizations), 66000 phonemes, and their corresponding speaker. Additionally, for every word, we provide lemmatization, a simple part-of-speech-tagging (15 grammatical categories), a detailed part-of-speech tagging (43 grammatical categories), syntactic dependencies, and a semantic analysis based on word embedding which represents each word in a 300-dimensional semantic space. To validate the dataset's quality, we build a model of hemodynamic brain activity based on information drawn from the annotation. Results suggest that the annotation's content and quality enable independent researchers to create models of brain activity correlating with a variety of linguistic aspects under conditions of near-real-life complexity.

License: CC-By Attribution 4.0 International

Wiki

This project contains three components: the annotation as a tab-separated-value (TSV) formatted table and a text-based TextGrid file (the native format of the software Praat). the data of the analysis that we ran as a validation of the annotation's content and quality. the corresponding paper that is hosted on github and published in F1000 research. Components 1 and 2 were built from a DataLad ...

Files

Loading files...

Citation

Components

  • studyforrest-speechannotation

    This component contains the annotation of speech spoken in the research cut (Hanke et al. 2014; Hanke et al., 2016) of the movie "Forrest Gump" (Zemec...

    Recent Activity

    Loading logs...

  • studyforrest-speechanno-validation

    This component contains the data of the analysis that we ran as a validation of the annotation of speech spoken in the research cut (Hanke et al., 201...

    Recent Activity

    Loading logs...

Tags

Recent Activity

Loading logs...

OSF does not support the use of Internet Explorer. For optimal performance, please switch to another browser.
Accept
This website relies on cookies to help provide a better user experience. By clicking Accept or continuing to use the site, you agree. For more information, see our Privacy Policy and information on cookie use.
Accept
×

Start managing your projects on the OSF today.

Free and easy to use, the Open Science Framework supports the entire research lifecycle: planning, execution, reporting, archiving, and discovery.