Welcome to the data repository of the CopCo corpus! CopCo is an eye-tracking corpus tailored to both psycholinguistics and natural language processing. The goal is to investigate reading behavior of Danish texts in various populations. To this end, we record eye movements of participants reading continuous Danish texts in their own speed. This project has been approved by the Ethics Commission of the Faculty of Humanities of the University of Copenhagen. Feel free to contact us if you have any questions or feedback ( The CopCo corpus is free to use for everyone. # Project structure - **'DatasetStatistics'** contains one file with the information about the included text materials and one file with the anonymized participant details. - **'RawData'** contains the EDF and result files saved from the EyeLink recordings. The 'RESULTS_FILE.txt' contains the participant's answers to the comprehension questions and the text trials in the order of appearance for that specific participant. - **'FixationReports'** contains the fixation and saccade events generated by the SR DataViewer software. - **'InterestAreaReports'** contains the character-level fixation information generated by the SR DataViewer software. - **'ExtractedFeatures'** contains one CSV file per participant with the computed word-level reading metrics. The descriptions of the extracted features can be found [here][1]. Note that these extracted feature files **do not** contain the eye-tracking data and the areas of interest for the comprehension questions anymore. - **'Experiment'** contains the files for building the experiment in the SR ExperimentBuilder software as well as the deployed and executable EyeLink files to run the experiment. - The link to the GitHub repository contains the code used for preprocessing and feature extraction. # Participants The CopCo corpus contains eye movement data from Danish native speakers, both from readers without dyslexia and readers with dyslexia. Additionally, there is a set of non-native speaking participants. The folder **'DatasetStatistics/'** contains further details to filter the participants by groups. # Stimuli The texts used for this experiment were either extracted from the [Danske Taler][2] archive or from the [Danish Wikipedia (Ugens artikler)][3]. # Publications Please refer to the following publication when using the CopCo data from native Danish speakers without reading disorders: Hollenstein, Nora, Maria Barrett, and Marina Björnsdóttir. "The Copenhagen Corpus of Eye Tracking Recordings from Natural Reading of Danish Texts." In *Proceedings of LREC*. 2022. [PDF][4] Refer to the following publication when using the CopCo data from native Danish speakers with dyslexia: Björnsdóttir, Marina, Nora Hollenstein, and Maria Barrett. "Dyslexia Prediction from Natural Reading of Danish Texts." In *Proceedings of the 24th Nordic Conference on Computational Linguistics (NoDaLiDa)*. 2023. [PDF][5] Please refer to the OSF citation when using data from non-native speaking readers. [1]: [2]: [3]: [4]: [5]:
