BabbleCor: A Crosslinguistic Corpus of Babble Development in Five Languages

doi:10.17605/OSF.IO/RZ4TX

Title	Authors

Home

**What is BabbleCor?** BabbleCor is a crosslinguistic corpus of infant and child vocalizations from 52 children exposed to five different languages: English, Spanish, Tsimane', Yêlí-Dnye, Tseltal Mayan, and bilingual Quechua-Spanish. **How was BabbleCor created?** BabbleCor consists of very short audio clips (approximately 400ms) of child vocalizations. To generate these clips, each child first completed a daylong audio recording, between 6 and 16 hours in length, where a small, lightweight recorder was worn inside of a clothing pocket designed for the device. From these daylong recordings, child vocalizations were either identified by the proprietary Language ENvironment Analysis algorithm, which assigns utterances to speakers in naturalistic audio recordings (e.g. Female Adult, Child) or the vocalizations were identified by hand. 100 of the utterances identified as child vocalizations were randomly selected and chopped into the smaller clips in BabbleCor. **Where do the BabbleCor clip annotations come from?** Each short clip (~400ms) was categorized according to a 5-way scheme by citizen science annotators on the iHEARu PLAY platform (https://www.ihearu-play.eu/). Annotators classified clips as 1) canonical - containing a consonant to vowel transition, 2) non-canonical - not containing a consonant to vowel transition, 3) crying, 4) laughing, or 5) junk. For further details on corpus creation, please see Methods described in [Cychosz et al. (2021)][1]. **What are the metadata?** There are two metadata components in BabbleCor: *[Annotation_Tags][3]* and *[Public_Metadata][4]*. As the name suggests, *Public_Metadata* includes corpus metadata that is publicly available to all corpus users: child ID, child age, child's assigned gender, corpus of origin, and clip ID. *Annotation_Tags* contains the annotation tags for each clip ID, such as canonical babble, laughing, etc. For access to the annotation tags, please sign, scan, & email the data sharing agreement to babblecorpus@gmail.com (see *[Data_Sharing_Agreement][5]*). [1]: https://pubmed.ncbi.nlm.nih.gov/33497512/ [2]: https://psyarxiv.com/9vzs5/ [3]: https://osf.io/2n456/ [4]: https://osf.io/rau7f/ [5]: https://osf.io/64puz/

OSF does not support the use of Internet Explorer. For optimal performance, please switch to another browser.

This website relies on cookies to help provide a better user experience. By clicking Accept or continuing to use the site, you agree. For more information, see our Privacy Policy and information on cookie use.

Start managing your projects on the OSF today.

Free and easy to use, the Open Science Framework supports the entire research lifecycle: planning, execution, reporting, archiving, and discovery.

Create an Account Learn More Hide this message

Main content

Home

Menu

Start managing your projects on the OSF today.

Main content

Links to this project

Home

Menu

Add new wiki page

Page permissions have changed

Wiki page deleted

Connected to the collaborative wiki

Connecting to the collaborative wiki

Collaborative wiki is unavailable

Browser unsupported

Start managing your projects on the OSF today.