<p>This OSF archive contains all datasets and analyses scripts for the project analyzing early second language vocabulary learning in a big dataset by <a href="https://github.com/duolingo/halflife-regression" rel="nofollow">Elise Hopman</a>, <a href="https://billdthompson.github.io" rel="nofollow">Bill Thompson</a>, <a href="https://alab.psych.wisc.edu/people/" rel="nofollow">Joe Austerwei</a>l and <a href="http://sapir.psych.wisc.edu" rel="nofollow">Gary Lupyan</a>. It also contains the PDF for the CogSci 6 page paper writeup as well as the slides for the CogSci talk presented during CogSci 2018. </p> <p>Our actual regression analysis is in the file 'script7_duolingo_regression_analysis.R', and was done on the (trimmed) dataset 'dataset6_duolingo_analyzed_corpus.csv'. </p> <p>The folder 'scripts and data' contains all python and R scripts, as well as .txt and .csv datafiles that we used in our analyses to get from the original Duolingo learning traces dataset released by <a href="https://billdthompson.github.io" rel="nofollow">Settles & Meeder (2016)</a> to the corpus with psycholinguistic predictors that we analyzed. Most of our coding work consisted of putting together the duolingo data with other corpora, so scripts 1-6 deal with creating the dataset. We have made all scripts and intermediate datasets available in case this is of interest to anyone; the <strong>most useful dataset for other researchers interested in investigating the duolingo dataset from a psycholinguistic point of view</strong> is the file on this OSF storage named: 'dataset5_duolingo_full_corpus.csv'.</p> <p>If you have any questions about any of these data or scripts, please feel free to contact us at hopman@wisc.edu. </p>
