# Lost in a story, detached from the words #
#### Eekhof, L. S., Kuijpers, M. M., Faber, M., Gao, X., Mak, M., van den Hoven, E., & Willems, R. M. (2021). Lost in a Story, Detached from the Words. *Discourse Processes*. https://doi.org/10.1080/0163853X.2020.1857619
This OSF page contains the analysis scripts and data necessary for these scripts. The scripts and data are meant for reviewers and readers of the paper only. The full data set might be published later for use by other researchers.
# About the data
There are two data files. Data_all_OSF.rda contains the eye-tracking data and information about the words that were read, incl. skipped words and words for which no information is available. Questionnaire_data_OSF.rda contains the questionnaire data.
## Variables in data_all_OSF.rda
- **SUBJECT**: subject numbers 1-330, 171 unique subjects, 1-109 = study 2, 201-243 = study 3, 301-330 = study 1
- **STORY**: factor with 10 levels
- **STUDY**: the study from which the data were derived (1-3)
- **POSITION**: absolute position of the word in the sentence, with 1 = first word of the sentence
- **LEMMAFREQUENCY**: lemma frequency of words, obtained from: http://crr.ugent.be/programs-data/subtitle-frequencies/subtlex-nl/downloading (column "FREQlemma")
- **CONCRETENESS**: concreteness scores of words (1-5), obtained from: http://crr.ugent.be/archives/1602 (column "con_m")
- **AOA**: age of acquisition of words, obtained from: http://crr.ugent.be/archives/1602 (column "average_AoA")
- **OTAN**: orthographic neighborhood size of words, obtained from: http://clearpond.northwestern.edu/dutchpond.php (column "F")
- **WORD LENGT**: number of characters of words
- **GAZDUR**: gaze duration on words (i.e., first run dwell time)
- **DOMINANT.POS**: pos-tags, obtained from: http://crr.ugent.be/programs-data/subtitle-frequencies/subtlex-nl/downloading
- **WORDCLASS**: factor with 2 levels: content or function. Words were labeled as function words if they had one of the following POS-tags: Determiner, Interjection, Numeral, Conjunction, Pronoun, Preposition. All other words (incl. words with no POS-tag) were labeled as content words.
If the suffix _CS is added behind the variable, this is the scaled and centered version of that variable.
## Variables in skipregdata.rda
- **PARTICIPANT**: subject numbers 1-330, 171 unique subjects, 1-109 = study 2, 201-243 = study 3, 301-330 = study 1
- **STORY**: factor with 10 levels
- **GAZDUR**: gaze duration on words (i.e., first run dwell time)
- **IA_SKIP**: binary variable that indicates whether the word was skipped (1) or not (0)
- **IA_REGRESSION_OUT**: binary variable that indicates whether a regression was made from the word (1) or not (0)
- **IA_REGRESSION_OUT_COUNT**: numeric variable that indicates how many regressions were made from the word
- **IA_REGRESSION_IN**: binary variable that indicates whether a regression was made to the word (1) or not (0)
- **IA_REGRESSION_IN_COUNT**: numeric variable that indicates how many regressions were made to the word
- **DOMINANT.POS**: pos-tags, obtained from: http://crr.ugent.be/programs-data/subtitle-frequencies/subtlex-nl/downloading
- **WORDCLASS**: factor with 2 levels: content or function. Words were labeled as function words if they had one of the following POS-tags: Determiner, Interjection, Numeral, Conjunction, Pronoun, Preposition. All other words (incl. words with no POS-tag) were labeled as content words.
## Variables in questionnaire_data_OSF.rda
Note: this data set will load in R as "data_total"
- **SUBJECT**: subject numbers 1-330, 171 unique subjects, 1-109 = study 2, 201-243 = study 3, 301-330 = study 1
- **STORY**: factor with 10 levels
- **STUDY**: the study from which the data were derived (1-3)
- **ART**: Dutch Author Recognition Test scores. See: Koopman, E. M. E. (2015). Empathic reactions after reading: The role of genre, personal factors and affective responses. *Poetics, 50*, 62-79. DOI: https://doi.org/10.1016/j.poetic.2015.02.008
- **SWAS**: mean SWAS scores based on the mean of available SWAS items (i.e., all items in study 2 and 3, and 11/18 items in study 1). See: Kuijpers, M. M., Hakemulder, F., Tan, E. S., & Doicaru, M. M. (2014). Exploring absorbing reading experiences. *Scientific Study of Literature, 4*(1), 89-122. DOI: https://doi.org/10.1075/ssol.4.1.05kui
- **APPRE**: liking scores 1-7 (was 1-10 in study 1, so linearly transformed to 1-7)
- **AGE**
- **GENDER**: note that 'x' means other.
If the suffix _CS is added behind the variable, this is the scaled and centered version of that variable.
# About the analysis script
Code used to run the analyses can be found in the R markdown file under Analyses. In case you you want to simply read the code and see the output, you can inspect the html file. In case you wish to run the analyses yourself, you can open the markdown file in Rstudio and run the code from Step 1 onwards, using the data files supplied. In that case it might be smart to download the models (rda files) that are provided; running these models from scratch takes a lot of time.