Main content
Wattpad titles corpus
Date created: 2019-05-23 01:16 AM | Last Updated: 2024-04-30 05:05 PM
Identifier: DOI 10.17605/OSF.IO/5GXMN
Category: Data
Description: This is collection of all the stories' titles published on Wattpad at the date: January 2018. It's a corpus of around 30 millions titles in more than 50 different languages. It includes mainly original fiction and a small part of fan fiction (roughly 10%). The R Markdown files regarding the procedures for network analysis and sentiment analysis can be found in the GitHub repository: We published an article based on this data
This project contains the data, the code and the results of some analysis on stories published on
The corpus reflects the state of Wattpad as per January 2018, based on the sitemap files found on the server. Thus, it is not a complete dataset of all Wattpads' stories. More information about corpus building, analyses, and results can be found in the article: Pianzola, F., Rebora, S, an…
Files can now be accessed and managed under the Files tab.