# Polarization Dictionary
The dictionary is released under `final_dict.csv`.
For a breakdown by subcomponent see `dict_hclust.csv`.
___
# How to get the data?
## Disclaimer
The current project utilizes many datasets from different sources. Most of them we are unable to fully share (e.g., Twitter data). However, we provide all the information needed to retrieve the data independently.
## Validations
### Cross Validation
Original data is found here:
https://osf.io/67zkd/
### BLM Validation
Original data by Arif et al. (2018) is found here: https://github.com/leo-gs/ira-reproducibility
We were able to retrieve only some of the original data, the tweet IDs we used in the analysis is found in `sts_to_share.csv`.
### Reddit Validation
We released the datasets with message IDs for anyone to collect independently `control_reddits_full_ids.rds` and `poli_reddits_full_ids.rds` The list of 1000 subreddits was taken from https://github.com/saiarcot895/reddit-visualizations/blob/master/subreddits-list.csv
### COVID-19 Validation
Civiqs data (originally posted here: https://civiqs.com/results/coronavirus_concern?annotations=true&uncertainty=true&zoomIn=true&sumTotals=true&net=true&party=Republican) is released under `civiqs_official_net_concern.csv`.
COVID-19 Twitter data can be retrieved using tweet IDs listed on `covid_ids_to_share.rds` R file.
## Trolls Analysis
American controls Twitter data can be retrieved using tweet IDs listed on `american_control_ids.rds` R file.
For content-matching analysis use `tweets_controls_content_matching_ids.rds` R file.
For trolls' data, please visit:
https://transparency.twitter.com/en/information-operations.html
and https://github.com/fivethirtyeight/russian-troll-tweets