The Upworthy Research Archive is an open dataset of thousands of A/B tests of headlines conducted by Upworthy from January 2013 to April 2015. This repository includes the full data from the archive. **Learn more about the archive at the project's main website: [](** This repository includes the following files: - ****: the python code used to create the research samples. Python version 3.8.3 - **upworthy-archive-datasets/**: tests from the archive, split into three files: - **Exploratory data**: upworthy-archive-exploratory-packages-03.12.2020 - **Confirmatory data**: upworthy-archive-confirmatory-packages-03.12.2020.csv - **Holdout data**: upworthy-archive-holdout-packages-03.12.2020.csv - **Undeployed data**: upworthy-archive-undeployed-packages.01.12.2021.csv - **google-analytics-data/**: - Contextual information exported from the Upworthy Google Analytics Account - **upworthy-timestamped-screenshots/**: - time-stamped screenshots from the, showing articles on the Upworthy website that were linked to from tests ### Project Team: - [J. Nathan Matias](, Assistant Professor, Cornell University: co-lead - [Kevin Munger](, Assistant Professor, Penn State University: co-lead - [Marianne Aubin Le Quere](, PhD student, Cornell University: data validation and documentation - [Charles Ebersole](, Postdoc, University of Virginia: data controller ### License: The Upworthy Research Archive is available through an agreement between Good/Upworthy and Cornell University. Cornell University is publishing the Upworthy Research Archive, all code in this repository, and documentation under a Creative Commons Attribution 4.0 International License. ![Cornell College of Agriculture and Life Sciences][2] ![Penn State College of the Liberal Arts][3] [2]: [3]: