Main content



Loading wiki pages...

Wiki Version:
This OSF archive contains all datasets and analyses scripts for the project analyzing early second language vocabulary learning in a big dataset by [Elise Hopman][1], [Bill Thompson][2], [Joe Austerwei][3]l and [Gary Lupyan][4]. It also contains the PDF for the CogSci 6 page paper writeup as well as the slides for the CogSci talk presented during CogSci 2018. Our actual regression analysis is in the file 'script7_duolingo_regression_analysis.R', and was done on the (trimmed) dataset 'dataset6_duolingo_analyzed_corpus.csv'. The folder 'scripts and data' contains all python and R scripts, as well as .txt and .csv datafiles that we used in our analyses to get from the original Duolingo learning traces dataset released by [Settles & Meeder (2016)][5] to the corpus with psycholinguistic predictors that we analyzed. Most of our coding work consisted of putting together the duolingo data with other corpora, so scripts 1-6 deal with creating the dataset. We have made all scripts and intermediate datasets available in case this is of interest to anyone; the **most useful dataset for other researchers interested in investigating the duolingo dataset from a psycholinguistic point of view** is the file on this OSF storage named: 'dataset5_duolingo_full_corpus.csv'. If you have any questions about any of these data or scripts, please feel free to contact us at [1]: [2]: [3]: [4]: [5]:
OSF does not support the use of Internet Explorer. For optimal performance, please switch to another browser.
This website relies on cookies to help provide a better user experience. By clicking Accept or continuing to use the site, you agree. For more information, see our Privacy Policy and information on cookie use.

Start managing your projects on the OSF today.

Free and easy to use, the Open Science Framework supports the entire research lifecycle: planning, execution, reporting, archiving, and discovery.