Main content

Home

Menu

Loading wiki pages...

View
Wiki Version:
This repository contains analysis scripts for the study described in our manuscript, "[The validity and utility of activity logs as a measure of student engagement][1]," published in the proceedings of the 2019 Learning Analytics and Knowledge conference. Four files are available: 1. [FeatureExtraction.ipynb][2] 2. [CourseClustering.R][3] 3. [ModelingEngagement.R][4] 4. [ModelSummaries.html][5] These three scripts ingest data containing confidential records that cannot be shared. For this reason, the scripts are not provided to be functional, but rather to provide a transparent, persistent, and open accounting of the methods described in our article. The submitted article contains a thorough narrative description of these analyses, but by making these scripts available we allow a more complete and precise record of study methods. All results as described in the paper match the output of these scripts as provided here. **[FeatureExtraction.ipynb][6]** is an iPython notebook that was executed within the Google Cloud Platform using Datalab Cloud. Our Canvas web logs (Canvas Requests) are maintained in a BigQuery dataset, and this notebook executes SQL queries that extract features from the web logs, and export them as individual CSV files, one for each course. While there is currently no data dictionary for Canvas Requests, we occasionally join these web logs to structured Canvas data, and the Canvas dictionary for these tables is available [here][7]. Once the individual CSV files were exported, we then manually concatenated them to create a single `features.csv` file. **[CourseClustering.R][8]** is an R script that ingests `features.csv` and also appends student contextual variables (`course_career_status.csv`) as well as outcomes (`ser_tag_grd_data.csv`), to remove courses that meet the study's exclusion criteria, and then to run a clustering routine on the remaining courses. This script produces `featuresWithClusters.csv`. **[ModelingEngagement.R][9]** is an R script that ingests `featuresWithClusters.csv`, re-appends outcomes data (`ser_tag_grd_data.csv`), rescales the features into z-scores, and then prepares logistic models separately for courses in each of the six clusters. It then conducts a sensitivity analysis by sourcing a publicly-available [dprime function][10]. **[ModelSummaries.html][11]** is the HTML export of the r commands and output (created using knitr) as shown when running the [ModelingEngagement.R][12] script. As such, it includes full model summaries, including diagnostics and coefficient values. [1]: https://doi.org/10.1145/3303772.3303789 [2]: https://osf.io/ghsbf/ [3]: https://osf.io/7cbzx/ [4]: https://osf.io/a85en/ [5]: https://osf.io/54dtj/ [6]: https://osf.io/ghsbf/ [7]: https://portal.inshosteddata.com/docs [8]: https://osf.io/7cbzx/ [9]: https://osf.io/a85en/ [10]: https://github.com/neuropsychology/neuropsychology.R/blob/master/R/dprime.R [11]: https://osf.io/54dtj/ [12]: https://osf.io/a85en/
OSF does not support the use of Internet Explorer. For optimal performance, please switch to another browser.
Accept
This website relies on cookies to help provide a better user experience. By clicking Accept or continuing to use the site, you agree. For more information, see our Privacy Policy and information on cookie use.
Accept
×

Start managing your projects on the OSF today.

Free and easy to use, the Open Science Framework supports the entire research lifecycle: planning, execution, reporting, archiving, and discovery.