Building and curating conversational corpora for diversity-aware language science and technology

Andreas Liesenfeld; Mark Dingemanse

doi:10.17605/OSF.IO/CWVBE

Title	Authors

Building and curating conversational corpora for diversity-aware language science and technology

Contributors:

Date created: | Last Updated:

: DOI | ARK

Creating DOI. Please wait...

Create DOI

Category: Project

Description: We present an analysis pipeline and best practice guidelines for building and curating corpora of everyday conversation in diverse languages. Surveying language documentation corpora and other resources that cover 67 languages and varieties from 28 phyla, we describe the compilation and curation process, specify minimal properties of a unified format for interactional data, and develop methods for quality control that take into account turn-taking and timing. Two case studies show the broad utility of conversational data for (i) charting human interactional infrastructure and (ii) tracing challenges and opportunities for current ASR solutions. Linguistically diverse conversational corpora can provide new insights for the language sciences and stronger empirical foundations for language technology.

Projects
Registrations

Results: All Projects Results: My Projects Results: All Registrations Results: My Registrations

Files

Loading files...

Citation

Recent Activity

Loading logs...

OSF does not support the use of Internet Explorer. For optimal performance, please switch to another browser.

This website relies on cookies to help provide a better user experience. By clicking Accept or continuing to use the site, you agree. For more information, see our Privacy Policy and information on cookie use.

Start managing your projects on the OSF today.

Free and easy to use, the Open Science Framework supports the entire research lifecycle: planning, execution, reporting, archiving, and discovery.

Create an Account Learn More Hide this message

Main content

Links to this project

Building and curating conversational corpora for diversity-aware language science and technology

Link other OSF projects

Files

Citation

Recent Activity

Start managing your projects on the OSF today.