**Paper:**
https://doi.org/10.1371/journal.pone.0194290
**Data:**
The data consists of three files:
- *outcomes*: dense table with county drinking and socio-demographic variables
- *topics*: topic frequencies for each county in sparse format
- *1grams*: word frequencies for each county in sparse format
**Data Files:**
Data is available in both CSV and MySQL formats:
- CSV
- outcomes.csv
- feat.cat_met_a30_2000_cp_w.msgs_2011to13.cnty.16to16.csv.zip
- feat.1gram.msgs_2011to13.cnty.16to16.0_1.csv.zip
- MySQL
- county_drinking_plosone2018.sql.zip
**Analysis:**
All analysis was run using the [DLATK Python package][1]. This package uses MySQL so data is made available in a single, convenient SQL dump.
[1]: http://dlatk.wwbp.org