Main content

Home

Menu

Loading wiki pages...

View
Wiki Version:
This repository contains the R code and data for the manuscript titled "Effects of availability, contingency, and formulaicity on the accuracy of English grammatical morphemes in second language writing". The corpora used in the study are [EFCAMDAT][1] and [full-text COCA][2], both of which are publicly available. The files included in this repository are as follows: 1. **R_Code.html** includes the R code used for the preprocessing, extraction, and analysis of data. All the computation was carried out using the High Performance Computing (HPC) service at the University of Birmingham. 2. **mor.slim.RData** includes four data frames in R; *ed.slim*, *ing.slim*, *tps.slim*, and *ps.slim*, each of which corresponds to a morpheme. They are the data submitted to statistical analyses. Please see **R_Code.html** for how these data frames were created. Each data frame includes the following columns: - `writingID:` Unique ID assigned to each writing - `mor:` Inflected form - `lemma:` The lemma of the above - `error.type:` Whether the given instance is an error (*OMMF*) or an accurate use (*ACC*) - `learnerID:` Unique ID assigned to each learner - `nationality:` Nationality of the learner (e.g., *cn* = Chinese) - `level:` Englishtown level at which the essay including the target (non-)occurrence of the morpheme was submitted - `prof:` Proficiency level of the learner - `writingno:` Writing number (e.g., 1 for the first writing of a learner, 2 for the second writing, and so forth) - `lemma.freq:` Frequency of the `lemma` in COCA - `surface.freq:` Frequency of the `mor` in COCA - `reliability:` Reliability of the `mor` calculated based on COCA - `dp.max:` Maximum standardized log-transformed $\Delta{P}$ - `fs:` The specific n-gram `dp.max` was calculated in - `freq.log:` Log-transformed `surface.freq` - `topicID:` Unique ID assigned to each topic - `prof.s`, `writingno.s`, `freq.log.s`, `reliability.s`, and `dp.max.s:` Standardized versions of the corresponding variables 3. The **CSV folder** contains four CSV files corresponding to *ed.slim*, *ing.slim*, *tps.slim*, and *ps.slim*. They are primarily for those who wish to analyze the data through other means than with R. 4. **irregular.past.list.csv** and **irregular.plural.list.csv:** These CSV files include the lists of irregular verbs (**irregular.past.list.csv**) and irregular plural forms (**irregular.plural.list.csv**). Please see **R_Code.html** for how they were used in data extraction. [**Update on 27 April, 2022**: We have fixed a few presentational issues in the R_Code.html file.] [1]: https://philarion.mml.cam.ac.uk/ [2]: https://www.corpusdata.org/
OSF does not support the use of Internet Explorer. For optimal performance, please switch to another browser.
Accept
This website relies on cookies to help provide a better user experience. By clicking Accept or continuing to use the site, you agree. For more information, see our Privacy Policy and information on cookie use.
Accept
×

Start managing your projects on the OSF today.

Free and easy to use, the Open Science Framework supports the entire research lifecycle: planning, execution, reporting, archiving, and discovery.