<p>This project contains the data, the code and the results of some analysis on stories published on <a href="http://wattpad.com" rel="nofollow">wattpad.com</a>.</p> <p>The corpus reflects the state of Wattpad as per January 2018, based on the sitemap files found on the server. Thus, it is not a complete dataset of all Wattpads' stories. More information about corpus building, analyses, and results can be found in the <strong>article</strong>: Pianzola, F., Rebora, S, and Lauer, G. "Wattpad as a resource for literary studies in the 21st century: Quantitative and qualitative examples of the importance of digital social reading and readers’ comments in the margins". Under review.</p> <p>The <strong>\Titles</strong> folder contains several files listing all the story titles retrieved from the URLs found on the website sitemap.</p> <p>The <strong>\Stories_statistics_and_metadata</strong> folder contains lists of all the titles in the "Classics" and "Teen fiction" categories, including metrics for the number of reads, votes, and comments.</p> <p>The <strong>\Language_and_words</strong> folder contains the results of language detection done with the R package cld2. It also contains lists of words frequency for the 13 most used languages and files with the stopwords used in our article.</p> <p>The <strong>\Users_statistics</strong> folder contains anonymised information about the number of comments written by users who read 12 stories in English (6 stories in the "Classics" category and 6 in the "Teen fiction" category). It also contains information about the users' geographical location.</p> <p>The <strong>\Sentiment_analysis</strong> folder contains the data used to run sentiment analysis on 12 stories in English (6 stories in the "Classics" category and 6 in the "Teen fiction" category) using the R package Syuzhet. For the "Classics", we provide the full texts with the corresponding percentages used in our article. For the "Teen fiction", we included a list of the paragraphs we removed before running the analysis. For copyright reasons we cannot provide nor the texts of the "Teen fiction" stories, nor the users' comments.</p> <p>The <strong>\Networks</strong> folder contains the data of the nodes and edges for the network visualization done with Gephi.</p> <p>The <strong>R Markdown files</strong> regarding the procedures for network analysis and sentiment analysis can be found in the GitHub repository: <a href="https://github.com/SimoneRebora/Wattpad_analysis" rel="nofollow">https://github.com/SimoneRebora/Wattpad_analysis</a></p>
