Main content

Home

Menu

Loading wiki pages...

View
Wiki Version:
Balota, D. A., Yap, M. J., Cortese, M. J., Hutchison, K. A., Kessler, B., Loftis, B., Neely, J. H., Nelson, D. L., Simpson, G. B., & Treiman, R. (2007). The English Lexicon Project. Behavior Research Methods, 39, 445-459. The English Lexicon Project is a multiuniversity effort to provide a standardized behavioral and descriptive data set for 40,481 words and 40,481 nonwords. It is available via the Internet at elexicon.wustl.edu. Data from 816 participants across six universities were collected in a lexical decision task (approximately 3400 responses per participant), and data from 444 participants were collected in a speeded naming task (approximately 2500 responses per participant). The present paper describes the motivation for this project, the methods used to collect the data, and the search engine that affords access to the behavioral measures and descriptive lexical statistics for these stimuli. ## November, 2021 updates The raw data files in `ldt_raw.zip` and `nmg_raw.zip` have been updated to address some of the issues raised when parsing them using scripts in the [Julia language](https://julialang.org). The files `ldt.patch` and `nmg.patch` document the changes that have been made. Several of the raw data files repeated the subject numbers from other files. Some were able to be resolved. Others were added to "skip" lists in the processing because the discrepancies could not be resolved. These files are included in the raw collections in case later investigation can show how to resolve the conflict but they should not be used in analyses at present. The skip lists are: ```julia SKIPLIST = [ # list of redundant or questionable files "9999.LDT", "793DATA.LDT", "Data999.LDT", "Data1000.LDT", "Data1010.LDT", "Data1016.LDT", ] ``` for the lexical decision task and ```julia SKIPLIST = String[ # list of redundant or questionable files "Data2815.NMG", "Data2816.NMG", "Data2817.NMG", "Data2818.NMG", "Data2819.NMG", "Data2820.NMG", "Data2821.NMG", "Data2778.NMG", "Data2779.NMG", "Data4140.NMG", "Data4100.NMG", "Data4140.NMG", "Data4110.NMG", "Data4140.NMG", "283DATA.NMG", "Data3872.NMG", "Data3882.NMG", "Data3884.NMG", "Data3886.NMG", "Data3894.NMG", "371DATA.NMG", "Data3911.NMG", "Data3912.NMG", "Data3930.NMG", "Data4210.NMG", "Data4118.NMG", "Data4119.NMG", "Data5255.NMG", ] ``` for the naming task. The repository https://github.com/dmbates/EnglishLexicon.jl contains the code to extract the item, subject and trial-level data from these raw data files and store them as [Arrow](https://arrow.apache.org) files. These files can be read directly into R using `arrow::read_feather` or in Python/Pandas using the `pyarrow` package.
OSF does not support the use of Internet Explorer. For optimal performance, please switch to another browser.
Accept
This website relies on cookies to help provide a better user experience. By clicking Accept or continuing to use the site, you agree. For more information, see our Privacy Policy and information on cookie use.
Accept
×

Start managing your projects on the OSF today.

Free and easy to use, the Open Science Framework supports the entire research lifecycle: planning, execution, reporting, archiving, and discovery.