Home

Menu

Loading wiki pages...

View
Wiki Version:
<h1>Political Astroturfing on Twitter: How to Coordinate a Disinformation Campaign</h1> <h3>Franziska B. Keller, David Schoch, Sebastian Stier, JungHwan Yang</h3> <h2>Replication materials</h2> <p>The data and code provided here enables the replication of all analyses in <strong><em>Political Astroturfing on Twitter: How to Coordinate a Disinformation Campaign</em></strong>.</p> <h2>Content</h2> <p>The archive contains raw and aggregated data, R scripts and bash scripts underlying the analysis in the paper and supplementary material. </p> <h2>R packages</h2> <p>Most of the analyses were done in R. The used packages are listed below together with the code to install them.</p> <p>```{r libs, eval =FALSE} rm(list=ls())</p> <p>if (!requireNamespace("pacman", quietly = TRUE)) { install.packages(pacman) }</p> <p>if (!requireNamespace("remotes", quietly = TRUE)) { install.packages(remotes) }</p> <p>pacman::p_load(tidyverse, tidytext, lubridate, igraph, extrafont, rvest, data.table, blockcluster, rtweet, update = FALSE)</p> <h1>add github packages (specific commit is given as ref argument)</h1> <p>if (!requireNamespace("patchwork", quietly = TRUE)) { remotes::install_github("thomasp85/patchwork",ref="fd7958b") } ```</p> <h2>R scripts</h2> <p>| script | description| |---|--| | <code>figures.R</code> | script that produces the figures of the paper and supplementary material | | <code>text_analysis.R</code> | script that reproduces the text analysis results | | <code>detection.R</code> | script for all detection methods | |<code>analysis.R</code> | code that can only be run with deanonymized data (only shared for completeness) |</p> <h2>Bash scripts</h2> <p>The co(re)-tweets were computed using a series of bash scripts. The scripts are shared as-is but need the un-anonymized data to run properly. The code assumes the following structure of a raw tweet file:</p> <ul> <li><code>user_name</code>: user_screen_name of the account</li> <li><code>user_id</code>: user_id of the account</li> <li><code>date</code>: datetime of the tweet</li> <li><code>text</code>: text of the tweet</li> <li><code>rt_user</code>: user_screen_name of retweeted account or <code>NA</code> if tweet is not a retweet</li> </ul> <p>columns should be separated by ";".</p> <h2>System requirements</h2> <p>All analyses where run on a Ubuntu 18.04 ThinkPad with 16GB of RAM and R version 3.5.1.</p> <h2>Data files</h2> <p>All data to reproduce the results of the main paper are provided with this document.</p> <ul> <li>rds/RDS files are readable in R with <code>readRDS()</code>.</li> <li>RData files are readable in R with <code>load()</code>.</li> <li>graphml files are readable with <code>igraph::read_graph()</code>.<br> NOTE: R sometimes throws an error with graphml files. Try visone (<a href="http://visone.info" rel="nofollow">visone.info</a>) instead</li> </ul> <p>All <code>user_names</code> and <code>user_id</code>s of accounts not related to the NIS campaign (including suspect accounts) were anonymized to protect the identities of these users. NIS <code>user_names</code> and <code>user_id</code>s are given without anonymization. Existing accounts with a (former) NIS user_name are accounts that were registered after 2012 and are thus <strong>not</strong> related to the campaign.</p> <h3>Data to reproduce all figures</h3> <p>| data file | description | | --- | --- | | <code>times.RData</code> | contains three data frames of tweet counts per hour, day and weekday for regular and NIS accounts | | <code>nis_retweet_network.graphml</code> | retweet network among NIS accounts | | <code>cotweet_nis_60.graphml</code> | co-tweet network among NIS accounts (60 second threshold) | | <code>coretweet_nis_60.graphml</code> | co-retweet network among NIS accounts (60 second threshold) | | <code>hourly_activity_suspects.rds</code> | data frame of tweet counts per hour of suspect accounts | | <code>weekday_activity_suspects.rds</code> | data frame of tweet counts per weekday of suspect accounts | | <code>impact_data.RData</code> | contains three data frames including the number of followers, mentions and retweets received of NIS, suspects, opinion leader and regular accounts | | <code>nis_tweets.csv</code> | tweets sent by NIS accounts excluding the tweet text | | <code>suspect_tweets.csv</code> | tweets sent by suspect accounts excluding the tweet text (anonymized) | | <code>cotweets_nis_threshold/</code> | folder containing all raw co-tweet network data of NIS accounts with different thresholds | | <code>coretweets_nis_threshold/</code> | folder containing all raw co-retweet network data of NIS accounts with different thresholds | | <code>cotweet_all_times.csv</code> | all co-tweets occuring in the dataset as an edgelist together with time difference of posting (non-NIS accounts are anonymized) | | <code>coretweet_all_times_aggreg.rds</code> | number of detected accounts via co-retweeting when time threshold is altered | | <code>rt_counts_retweeted_by_nis.rds</code> | data frame of retweet counts of accounts that received a retweet by an NIS account broken down into categories (NIS, opinion leaders and randomly sampled accounts) | | <code>retweets_network.csv</code> | retweet network of the whole dataset as an edgelist | | <code>cotweet_network.rds</code> | complete co-tweet network of the dataset (60 second threshold) | | <code>coretweet_network.rds</code> | complete co-retweet network of the dataset (60 second threshold) | | <code>rt_received_by_group.rds</code> | list of two vectors. number of retweets received by regular, NIS and opinion leader accounts. The first vector includes the suspect accounts in the group of regular users; in the second vector suspects are grouped with NIS accounts. | | <code>rt_received_by_group_pol.rds</code> | list of two vectors. number of retweets received by regular, NIS and opinion leader accounts (tweets with political keywords only). The first vector includes the suspect accounts in the group of regular users; in the second vector suspects are grouped with NIS accounts. | | <code>ko_word_pattern.txt</code> | list of Korean stopwords and stemming patterns | | <code>rt_account.RDS</code> | retweeted acccounts by different groups of accounts | | <code>word_count.RDS</code> | counts of words used by different groups of accounts by day | | <code>words_by_group.RDS</code> | counts of words used by different groups of accounts by day excluding retweets | | <code>words_by_week.RDS</code> | counts of words used by different groups of accounts by week |</p>
OSF does not support the use of Internet Explorer. For optimal performance, please switch to another browser.
Accept
This website relies on cookies to help provide a better user experience. By clicking Accept or continuing to use the site, you agree. For more information, see our Privacy Policy and information on cookie use.
Accept
×

Start managing your projects on the OSF today.

Free and easy to use, the Open Science Framework supports the entire research lifecycle: planning, execution, reporting, archiving, and discovery.