Accompanying the paper of the same name, published in the Proceedings of the National Academy of Sciences of the United States of America (https://www.pnas.org/content/early/2020/04/24/1906364117). This repo contains all the county-level language feature and the census covariates that were used in the paper.
It also includes the WWBP Life Satisfaction language model which can be applied to text to predict the Life Satisfaction score of the author on a 0-10 scale, as a response to the Cantril Ladder question.
Details of how the Life Satisfaction model was trained, as well as the underlying dataset statistics and the survey questionnaire are provided in the SI Appendix accompanying the paper and uploaded in "Supplementary Materials."
Instructions for how to apply the model using the Python DLATK package are provided in the Code folder.
Cite as:
K. Jaidka, J. C. Eichstaedt, S. Giorgi, Data and resources for estimating geographic subjective well-being from Twitter: A comparison of dictionary and data-driven language methods. Open Science Framework. https://osf.io/jqk6f/. Deposited 7 April 2020