| Last Updated:
Creating DOI. Please wait...
We reanalyze Eichstaedt et al.’s (2015a) claim to have shown that language patterns among Twitter users, aggregated at the level of U.S. counties, predicted county-level mortality rates from atherosclerotic heart disease (AHD), with “negative” language being associated with higher rates of death from AHD and “positive” language associated with lower rates. First, we examine some of Eichstaedt et al.’s apparent assumptions about the nature of AHD, as well as some issues related to the secondary analysis of online data and to considering counties as communities. Next, using the data files supplied by Eichstaedt et al., we reanalyze their numerical results, including testing their model with mortality from an alternative cause of death, namely suicide. We identify numerous conceptual and methodological limitations that call into question the robustness and generalizability of Eichstaedt et al.’s claims. In particular, we find that the purported associations between “negative” and “positive” language and mortality are reversed, and even became stronger in some cases, when suicide is used as the outcome variable. We conclude that there is no evidence that analyzing Twitter data in bulk in this way can add anything useful to our ability to understand geographical variation in AHD mortality rates.
CC-By Attribution 4.0 International