Main content



Loading wiki pages...

Wiki Version:
This project contains the data, the code and the results of some analysis on stories published on The corpus reflects the state of Wattpad as per January 2018, based on the sitemap files found on the server. Thus, it is not a complete dataset of all Wattpads' stories. More information about corpus building, analyses, and results can be found in the **article**: Pianzola, F., Rebora, S, and Lauer, G. "Wattpad as a resource for literary studies in the 21st century: Quantitative and qualitative examples of the importance of digital social reading and readers’ comments in the margins". Under review. The **\Titles** folder contains several files listing all the story titles retrieved from the URLs found on the website sitemap. The **\Stories_statistics_and_metadata** folder contains lists of all the titles in the "Classics" and "Teen fiction" categories, including metrics for the number of reads, votes, and comments. The **\Language_and_words** folder contains the results of language detection done with the R package cld2. It also contains lists of words frequency for the 13 most used languages and files with the stopwords used in our article. The **\Users_statistics** folder contains anonymised information about the number of comments written by users who read 12 stories in English (6 stories in the "Classics" category and 6 in the "Teen fiction" category). It also contains information about the users' geographical location. The **\Sentiment_analysis** folder contains the data used to run sentiment analysis on 12 stories in English (6 stories in the "Classics" category and 6 in the "Teen fiction" category) using the R package Syuzhet. For the "Classics", we provide the full texts with the corresponding percentages used in our article. For the "Teen fiction", we included a list of the paragraphs we removed before running the analysis. For copyright reasons we cannot provide nor the texts of the "Teen fiction" stories, nor the users' comments. The **\Networks** folder contains the data of the nodes and edges for the network visualization done with Gephi. The **R Markdown files** regarding the procedures for network analysis and sentiment analysis can be found in the GitHub repository: