**DATA RECORDS**
----------------
----------
This repository has quantitative data, textual data, and ancillary media organized participant-wise; it also holds the custom software tools and
other material we used to acquire these data.
A detailed account of every piece of data can be found in the "Index Table - Ancillary Data" and "Index Table - Everything Else" spreadsheets. In these tables, 1 stands for data that is included and is valid, -1 stands for data that is included but failed quality control (invalid), 0 for data that is missing, and IRB for media bearing identifying information that the participant asked us to not release.
**Quantitative Data Folder**
----------------------------
----------
The quantitative data folder holds four comma separated value (csv) files, as
well as the voluminous thermal imaging data: (1) Questionnaire Data - 5 KB in size. (2) Physiological Data - 21.8 MB in size. (3) Keyboard Data - 42.7 MB. (4) Report Data - 13.9 KB. (5) Thermal Imaging Data - 1.86 TB.
The first two columns of the csv files hold the participant ID and group,
respectively. The participant ID is coded as $\text{T}_{\text{xyz}}$, while the group assignment takes values from the set [BH, BL, CH, CL]. The remaining csv columns are specific to the corresponding type of data; their description follows.
----------
**Questionnaire Data File**
In the Questionnaire Data file, in addition to the columns holding the participant ID (**Column A**) and group information (**Column B**), there are columns holding biographic data (**Columns C - K**) and other columns holding scores from psychometric inventories (**Columns L - Y**). Specifically:
**Column C: Age:** Age of participants in years.
**Column D: Gender:** Gender of participants [$1\equiv \text{Male}$ , $2 \equiv \text{female}$].
**Column E: Nationality:** Nationality of participants [$1 \equiv \text{United States}$, $2 \equiv \text{Others}$].
**Column F: Other\_Nationality:** Nationality of non-U.S. participants.
**Column G: Native\_Language:** Mother tongue of participants [$1 \equiv \text{English}$, $2 \equiv \text{Others}$].
**Column H: Other\_Native\_Language:** Mother tongue of bilingual participants.
**Column I: Education:** Educational level of participants [$1\equiv \text{High School}$, $2 \equiv \text{Undergraduate}$, $3 \equiv \text{Masters or equivalent}$, $4 \equiv \text{PhD, JD, or equivalent}$].
**Column J: Writing\_Proficiency:** Self-reported writing proficiency of participants in a seven-point Likert scale, where $1 \equiv \text{Not fluent at all}$ and $7 \equiv \text{Very fluent}$.
**Column K: Daily\_Email\_Frequency:** Self-reported daily use of email in a seven-point Likert scale, where $1 \equiv \text{Never}$ and $7 \equiv \text{Very often}$.
**Big Five Inventory (BFI)** - A trait psychometric related to the participant's key personality factors (McCrae & Costa). It has five sub-scales.
- **Column L: BFI\_Agreeableness:** The level of participant's friendliness with score range [9-45].
- **Column M: BFI\_Conscientiousness:** The level of participant's organized nature with score range [9-45].
- **Column N: BFI\_Extraversion:** The level of participant's outgoing nature with score range [8-40].
- **Column O: BFI\_Neuroticism:** The level of participant's nervousness with score range [8-40].
- **Column P: BFI\_Openness:** The level of participant's curiosity with score range [10-50].
**Emotion Regulation Questionnaire (ERQ)** - A trait psychometric related to the participant's ability to regulate emotions (Gross & John). It has two sub-scales.
- **Column Q: ERQ\_Cognitive\_Reappraisal:** The degree to which a participant can change the way s/he thinks about emotion-eliciting events with score range [6-42].
- **Column R: ERQ\_Expressive\_Suppression:** The degree to which a participant can change the way s/he responds to emotion-eliciting events with score range [4-28].
**Column S: Perceived Stress Scale (PSS):** Level of non-specific perceived stress of participants with score range [0-40]. This is a trait psychometric that predicts health-related outcomes associated with appraised stress (Cohen).
**NASA TLX** - A state psychometric administered upon completion of DT to gauge the loading this task induced to participants. NASA TLX (Hart & Staveland) features six sub-scales with common rating [1=Strongly disagree, 2=Disagree, 3=Somewhat disagree, 4=Neither agree or disagree, 5=Somewhat agree, 6=Agree, 7= Strongly agree].
- **Column T: NASA\_Mental\_Demand:** Perceived mental load induced by DT.
- **Column U: NASA\_Physical\_Demand:** Perceived physical activity induced by DT.
- **Column V: NASA\_Temporal\_Demand:** Perceived time pressure induced by DT.
- **Column W: NASA\_Performance:** Perceived success in executing DT.
- **Column X: NASA\_Effort:** Perceived amount of work expended to achieve the said level of DT performance.
- **Column Y: NASA\_Frustration:** Perceived level of irritation in performing DT.
----------
**Physiological Data File**
In the Physiological Data file, in addition to the columns holding the participant ID (**Column A**) and group information (**Column B**), there are columns holding treatment information, task information, timing, and signal data from the various sensing modalities used in the experiment. The recordings of the physiological sensors were synced. Hence, each row from left to right holds the time and the synced set of modal signal values recorded at that time. The temporal
resolution is fixed at 1 s across the board to match the slowest physiological
channels Chest BR and Chest HR. The main data repository (Data Citation 1)
holds the quality controlled values of the physiological variables (**_QC**). The
raw variable values and the R code that operates upon them to implement the
processes described herein, reside in GitHub (Data Citation 2). In more detail:
**Column C: Treatment:** The treatment during which each set of modal signal values was recorded.
**Column D: Time:** The recorded date and time for each set of modal signal values.
**Column E: Treatment\_Time:** The time elapsed in seconds since the start of the present treatment.
**Column F: Task:** Labeling of email vs. report writing activity during DT.
**Column G: PP:** Values of the perinasal perspiration signal in $^{\circ}\text{C}^2$.
**Column H: EDA:** Values of the EDA signal in $\mu \text{S}$, measured with E4 in the wrist of the participant's non-dominant hand.
**Column I: BR:** Values of the breathing rate signal in BPM, measured with the BioHarness in the participant's chest.
**Column J: Chest\_HR:** Values of the heart rate signal in BPM, measured with the BioHarness in the participant's chest.
**Column K: Wrist\_HR:** Values of the heart rate signal in BPM, measured with E4 in the wrist of the participant's non-dominant hand.
----------
**Keyboard Data File**
In the Keyboard Data file, in addition to the columns holding the participant ID (**Column A**), group information (**Column B**), treatment information (**Column C**), time information (**Column D**), and task information (**Column E**), there are columns holding keystroke information. Specifically:
**Column F Is\_Key\_Up:** 0 stands for key released, while 1 stands for key depressed.
**Column G Key:** Alphanumeric code of the key that is either released or depressed.
----------
**Report Data File**
In the Report Data file, in addition to the columns holding the participant ID (**Column A**), group information (**Column B**), and treatment information (**Column C**), there are columns holding
report length measures, writing quality
measures by the *e-rater* scoring engine of the Educational Testing Service (ETS)
(Burstein et al.), and usage measures for the Delete key.
**Column D: Word_Count:** The number of words in the report.
**Column E: Character_Count:** The number of characters in the report.
**Column F: Criterion_Score:** The overall report quality score given by the
*e-rater*.
**Column G: Mechanics_Errors:** Number of mechanics errors in the report,
such as spelling errors; it is provided by the *e-rater*.
**Column H: Grammar_Errors:** Number of grammar errors in the report,
such as subject-verb agreement errors; it is provided by the *e-rater*.
**Column I: Usage_Errors:** Number of usage errors in the report, such as article
errors; it is provided by the *e-rater*.
**Column J: Style_Errors:** Number of style errors in the report, such as repetition
of words and very short or very long sentences; it is provided by the *e-rater*.
**Column K: Delete_Key_Count:** The number of times the backwards and forward delete
keys were depressed during the writing of the report. This information is extracted
from the Keyboard Data file.
**Column L: Mechanics_Errors/WC:** The number of mechanics errors divided
by the number of words in the report.
**Column M: Grammar_Errors/WC:** The number of grammar errors divided
by the number of words in the report.
**Column N: Usage_Errors/WC:** The number of usage errors divided by the
number of words in the report.
**Column O: Style_Errors/WC:** The number of style errors divided by the
number of words in the report.
**Column P: Delete_Key/CC:** The number of times the backwards and forward delete keys were depressed during the writing of the report, normalized per the report length in characters.
----------
**Thermal Imaging Data Subfolder**
This subfolder contains the facial thermal imaging sequences acquired during
experimentation via the S-Interface (Buddharaju et al.). These sequences can be used for extraction
of additional physiological indicators, such as breathing signals (Fei & Pavlidis), the reextraction
of perinasal perspiration signals, or other computer vision research.
The files holding the thermal imaging sequences are in a binary format called
.dat. Each .dat file is accompanied by a text file .inf. The header of each .inf
file has three numbers: (1) The number of thermal frames contained in the corresponding
.dat file. (2) The width of each thermal frame. (3) The height of
each thermal frame. The body of each .inf file contains the timestamps of all
thermal frames contained in the corresponding .dat file. The S-Interface
uses .inf files to properly open the corresponding .dat files and process them.
**Textual Data Folder**
-------------------
----------
**Reports and Emails File**
This Excel file holds the ST (**Column C**) and DT (**Column D**) reports of participants,
as well as the eight emails they wrote (**Column E** to **Column L**); its
size is 124.8 KB.
**Ancillary Media Folder**
--------------------------
----------
**Facial Videos**
Visual videos of participants' faces during experimentation. They are in mpeg format named as $\text{T}_{\text{xyz}}$-FV.mp4; their total size is 53.6 GB.
**Operational Theater Videos**
Visual videos of participants' desktop area during experimentation. They are in mpeg format named as $\text{T}_{\text{xyz}}$-OTV.mp4; their total size is 53.6 GB.
**Computer Screen Videos**
Visual videos of participants' computer screen during experimentation. They are in mpeg format named as $\text{T}_{\text{xyz}}$-CSV.mp4; their total size is 69.1 GB.
**Thermal MROI Videos**
Videos of participants' perinasal measurement region of interest (MROI) extracted through the S-Interface. The PP signals are computed upon these MROIs. The said videos are in mpeg format named as $\text{T}_{\text{xyz}}$-MROI.mp4; their total size is 3.05 GB.
**Tools Folder**
------------
----------
This folder contains the interfaces, applications, and videos needed to reproduce
the present experiment and collect additional data. Specifically: (a) p-Interface
for executing the experimental protocol; (b) Stroop application for stress priming
in the BH and CH groups; (c) natural landscape video for relaxation priming
in the BL and CL groups; (d) panel of judges video delivered to participants
during the presentation treatment; (e) Survey Gizmo links for delivery of the
experiment’s questionnaires. The only experimental tool that is missing is the
S-Interface, which is held in a general purpose repository \cite{Buddharaju:2018}, as it is software
with broader applicability.
**Data Records of HRV - Supplementary Data Folder**
---------------------------------------------------
----------
Under the Supplementary Data folder, there is a comma separated value (csv) file that holds the HRV data (15.9 MB). In this file, in addition to the columns holding the participant ID (**Column A**) and group information (**Column B**), there are columns holding treatment $|$ task information (**Column C** | **Column D**), absolute | relative timing (**Column E** | **Column F**), and the RR values (**Column G**). As it is the case with all other variables, this repository holds the quality controlled RR values. The raw variable values and the R code that operates upon them to implement the quality control and validation processes, reside on the GitHub (Data Citation 2).
**References**
----------
McCrae, R. R. & Costa Jr, P. T. A five-factor theory of personality. In
John, O. P., Robins, R. W. & Pervin, L. A. (eds.) *Handbook of Personality:
Theory and Research*, chap. 5, 159–181 (The Guilford Press, New York,
NY, 2008).
Gross, J. J. & John, O. P. Individual differences in two emotion regulation
processes: Implications for affect, relationships, and well-being. *Journal of
Personality and Social Psychology* **85**, 348–362 (2003).
Cohen, S. Perceived stress in a probability sample of the United States. In
Spacapan, S. & Oskamp, S. (eds.) *The Claremont Symposium on Applied
Social Psychology. The Social Psychology of Health*, 31–67 (Sage Publications,
Inc, Thousand Oaks, CA, 1988).
Hart, S. G. & Staveland, L. E. Development of NASA-TLX (Task Load
Index): Results of empirical and theoretical research. In Hancock, P. A.
& Meshkati, N. (eds.) *Human Mental Workload*, 139–183 (North-Holland,
Amsterdam, 1988).
Burstein, J., Tetreault, J. & Madnani, N. The E-rater R
automated essay
scoring system. In Shermis, M. D. & Burstein, J. (eds.) *Handbook of
Automated Essay Evaluation*, chap. 4, 77–89 (Routledge, New York, NY,
2013).
Buddharaju, P., Khatri, A. & Pavllidis, I. Software for: S-Interface
(formerly OTACS). https://figshare.com/articles/OTACS_Software/4244273/6 (2018).
Fei, J. & Pavlidis, I. Thermistor at a distance: unobtrusive measurement
of breathing. *IEEE Transactions on Biomedical Engineering* **57**, 988–998
(2010).
**Data Citations**
--------------
1. Zaman, S., Wesley, A., Cunha, D., Buddharaju, P., Akbar, F., Gao, G.,
Mark, G., Gutierrez-Osuna, R., & Pavlidis, I. Office Tasks 2019 – A Multimodal
Dataset. Open Science Framework https://doi.org/10.17605/osf.io/zd2tn (2019).
2. Zaman, S. & Pavlidis, I. Office-Tasks-2019-Methods. GitHub https://
github.com/UH-CPL/Office-Tasks-2019-Methods (2019).