Main content

Home

Menu

Loading wiki pages...

View
Wiki Version:
This dataset contains the classified tweets, the tweet ids, and the retweet netwoks used in the article: "Political polarization of news media and influencers on Twitter in the 2016 and 2020 US presidential elections". Authors: James Flamino, Alessandro Galezzi, Stuart Feldman, Michael W. Macy, Brendan Cross, Zhenkun Zhou, Matteo Serafino, Alexandre Bovet, Hernan A. Makse , Boleslaw K. Szymanski. The classification of news outlets in the different media categories is a matter of opinion, rather than a statement of fact. This opinion originated in publicly available datasets from fact-checking organizations, i.e. www.opensources.co (copy at https://github.com/alexbovet/opensources), www.mediabiasfactcheck.com & www.allsides.com. This classification of news media should not be interpreted as representing the opinions of the authors of the article. ----------------------------------------- There are 5 folders. `Classified_Tweets`: Contains the lists of tweet IDs of the tweets we were able to classify through the link they contain and the corresponding news outlet category. `Influencers_Classification`: Contains the classication of the most important influcencers into a category. `Retweet_Networks`: The retweets_network_final.tar.gz folder contains 8 retweet network csv files, one for each news category. All the entries in those csv's correspond to a retweet of a tweet classified as the csv's news category. These files are edgelists, with each retweet being a directed edge from nodes infl-id to auth-id. The columns are as follows: (id), i.e. the id of the retweet, (auth-id), i.e. the user id of the user who authored the retweet (the user who is being influenced), and (infl-id), i.e. the user id of the influencer, the one who wrote the original tweet that is now being retweeted. - `center_retweet_edges.csv` - `fake_retweet_edges.csv` - `left_extreme_retweet_edges.csv` - `left_leaning_retweet_edges.csv` - `left_retweet_edges.csv` - `right_extreme_retweet_edges.csv` - `right_leaning_retweet_edges.csv` - `right_retweet_edges.csv` `Tweet_IDs`: The tweet_ids_2016.txt.zst file contains the ID of tweets, retweets, and quotes for the 2016 election. In the file tweet_ids_2020.txt we provide the link to download the ID of tweets, retweets, and quotes for the 2020 election. `Figure_2`: It contains the raw data to reproduce Figure 2. The media categories and the news outlets in each category are detailed in our article. The retweet networks contain parallel edges whenever a user retweeted another user more than once. Softwares such as [hydrator](https://github.com/DocNow/hydrator) and [tweepy](https://www.tweepy.org/) can be used to “rehydrate” the tweet_IDs, i.e. download the full tweet objects using the tweet_IDs.
OSF does not support the use of Internet Explorer. For optimal performance, please switch to another browser.
Accept
This website relies on cookies to help provide a better user experience. By clicking Accept or continuing to use the site, you agree. For more information, see our Privacy Policy and information on cookie use.
Accept
×

Start managing your projects on the OSF today.

Free and easy to use, the Open Science Framework supports the entire research lifecycle: planning, execution, reporting, archiving, and discovery.