Main content

Home

Menu

Loading wiki pages...

View
Wiki Version:
## 1. Code https://github.com/danielmlow/reddit ## 2. Data **Please cite if you use this dataset:** Low, D. M., Rumker, L., Torous, J., Cecchi, G., Ghosh, S. S., & Talkar, T. (2020). Natural Language Processing Reveals Vulnerable Mental Health Support Groups and Heightened Health Anxiety on Reddit During COVID-19: Observational Study. *Journal of medical Internet research*, 22(10), e22635. @article{low2020natural, title={Natural Language Processing Reveals Vulnerable Mental Health Support Groups and Heightened Health Anxiety on Reddit During COVID-19: Observational Study}, author={Low, Daniel M and Rumker, Laurie and Torous, John and Cecchi, Guillermo and Ghosh, Satrajit S and Talkar, Tanya}, journal={Journal of medical Internet research}, volume={22}, number={10}, pages={e22635}, year={2020}, publisher={JMIR Publications Inc., Toronto, Canada} } **License** This dataset is made available under the Public Domain Dedication and License v1.0 whose full text can be found at: http://www.opendatacommons.org/licenses/pddl/1.0/ It was downloaded using pushshift API. Re-use of this data is subject to Reddit API terms. ### 2.1. Reddit Mental Health Dataset Find in `data/input/reddit_mental_health_dataset/` **Contains:** Posts and text features for the following timeframes from 28 mental health and non-mental health subreddits: - **15 specific mental health support groups** (r/EDAnonymous, r/addiction, r/alcoholism, r/adhd, r/anxiety, r/autism, r/bipolarreddit, r/bpd, r/depression, r/healthanxiety, r/lonely, r/ptsd, r/schizophrenia, r/socialanxiety, and r/suicidewatch) - **2 broad mental health** subreddits (r/mentalhealth, r/COVID19_support) - **11 non-mental health subreddits** (r/conspiracy, r/divorce, r/fitness, r/guns, r/jokes, r/legaladvice, r/meditation, r/parenting, r/personalfinance, r/relationships, r/teaching). `filenames` and corresponding timeframes: - `post:` Jan 1 to April 20, 2020 (called "mid-pandemic" in manuscript; r/COVID19_support appears) - `pre:` Dec 2018 to Dec 2019. A full year which provides more data for a baseline of Reddit posts - `2019:` Jan 1 to April 20, 2019 (r/EDAnonymous appears). A control for seasonal fluctuations to match `post` data. - `2018:` Jan 1 to April 20, 2018. A control for seasonal fluctuations to match `post` data. See Supplementary Materials for more information. Note: if subsampling (e.g., to balance subreddits), we recommend bootstrapping analyses for unbiased results. ### 2.2. COVID-19 mention dataset (Figure 1) find in `data/input/covid19_counts/` Same posts as in `post` above for 15 mental health subreddits. Counting these tokens: `'corona','virus','viral','covid', 'sars','influenza','pandemic', 'epidemic', 'quarantine','lockdown', 'distancing', 'national emergency', 'flatten', 'infect','ventilator', 'mask','symptomatic', 'epidemiolog', 'immun', 'incubation', 'transmission','vaccine' ` * One column `covid19_boolean`: if one of these words appears at least once (Figure 1) * One column `covid19_total`: total count of words * One column `covid19_weighed_words`: total count of words normalized by the amount of words (n_words) in a post (Figure S3). ### 2.3. COVID-19 cases Confirmed COVID-19 cases obtained from ourworldindata.org/covid-cases (source: European CDC).
OSF does not support the use of Internet Explorer. For optimal performance, please switch to another browser.
Accept
This website relies on cookies to help provide a better user experience. By clicking Accept or continuing to use the site, you agree. For more information, see our Privacy Policy and information on cookie use.
Accept
×

Start managing your projects on the OSF today.

Free and easy to use, the Open Science Framework supports the entire research lifecycle: planning, execution, reporting, archiving, and discovery.