<p>This repo contains the data reported in Hartshorne, Tenenbaum, & Pinker. A Critical Period for Second Language Acquisition: Evidence from 2/3 Million English Speakers.</p> <h2>Overview of files</h2> <ul> <li><strong>compiled.csv</strong> is the raw data and includes excluded subjects and items.</li> <li><strong>data.csv</strong> contains only the subjects and items that were analyzed. It also contains several derived variables (see description below).</li> <li><strong>processing.R</strong> contains an R script that converts the data in compiled.csv into the data in data.csv.</li> </ul> <p>Note the following caveat: subjects had opportunities to leave comments. These comments sometimes contained identifying information, including email addresses. Because we were unable to ensure that all identifying information had been censored, we stripped all comments from the data before uploading.</p> <h2>Data columns</h2> <p>(columns marked <code>N/A</code> are more complicated to explain than useful)</p> <p><strong>id</strong> Unique subject ID</p> <p><strong>date</strong> Date at start of experiment</p> <p><strong>time</strong> Time at start of experiment</p> <p><strong>gender</strong> gender</p> <p><strong>age</strong> age</p> <p><strong>natlangs</strong> List of subject's native languages</p> <p><strong>primelangs</strong> List of subject's primary language <em>now</em></p> <p><strong>dyslexia</strong> Did subject report difficulty with reading?</p> <p><strong>psychiatric</strong> Did subject report any psychiatric disorders</p> <p><strong>education</strong> highest level of education</p> <p><strong>tests</strong> N/A</p> <p><strong>Eng_start</strong> age at start of English learning</p> <p><strong>Eng_country_yrs</strong> number of years living in English-speaking country</p> <p><strong>house_Eng</strong> subject lives with any native English speakers?</p> <p><strong>dictionary</strong> subject reported using a dictionary to complete experiment</p> <p><strong>already_participated</strong> Subject reported prior participation in experiment</p> <p><strong>countries</strong> countries lived in</p> <p><strong>currcountry</strong> country currently lived in</p> <p><strong>US_region</strong> regions of USA lived in</p> <p><strong>UK_region</strong> regions of UK lived in</p> <p><strong>Can_region</strong> regions of Canada lived in</p> <p><strong>Ebonics</strong> speaker of Ebonics?</p> <p><strong>Ir_region</strong> county of Ireland lived in</p> <p><strong>UK_constituency</strong> constituency of UK lived in</p> <p><strong>nat_Eng</strong> native speaker of English</p> <p><strong>prime_Eng</strong> primary language is English</p> <p><strong>speaker_cat</strong> N/A</p> <p><strong>type</strong> N/A</p> <p><strong>lived_Eng_per</strong> Percentage of years speaking English that lived in English-speaking country.</p> <p><strong>Eng_little</strong> values are monoeng (native speaker of English only), bileng (native speaker of English + at least one other lang), lot (immersion learner), little (non-immersion learner).</p> <p><strong>correct</strong> percentage of critical items correct</p> <p><strong>elogit</strong> elogit of <code>correct</code></p> <p><strong>natcon</strong> N/A</p> <p><strong>primeeng</strong> N/A</p> <p><strong>edtype</strong> N/A</p> <hr> <p><strong>q1, q2, etc.</strong>:</p> <p>The remaining columns are of the form <code>q1</code>, <code>q2</code>, etc. These are the responses to individual questions. Multi-part questions are coded such that <code>q11_1</code>, <code>q11_2</code>, <code>q11_3</code>, and <code>q11_4</code> are part a, b, c, and d(respectively) of question 11. </p> <p>The values of these columns are different for the two data files. In <code>compiled.csv</code>, grammaticality judgments are coded as <code>0</code> = ungrammatical; <code>1</code> = grammatical. For picture-choice, <code>0</code> = picture on left; <code>1</code> = picture on right. In <code>data.csv</code>, for all item types, the values are <code>1</code> for a correct answer and <code>0</code> for an incorrect answer. <strong>Note</strong> that <code>data.csv</code> only contains columns for critical questions. (By definition, non-critical questions don't have a single correct answer.)</p>
