Wattpad titles corpus

Date created: | Last Updated:


Creating DOI. Please wait...

Create DOI

Category: Data

Description: This is collection of all the stories' titles published on Wattpad at the date: January 2018. It's a corpus of around 30 millions titles in more than 50 different languages. It includes mainly original fiction and a small part of fan fiction (roughly 10%). The R Markdown files regarding the procedures for network analysis and sentiment analysis can be found in the GitHub repository: https://github.com/SimoneRebora/Wattpad_analysis We published an article based on this data https://doi.org/10.1371/journal.pone.0226708

License: CC-By Attribution 4.0 International


This project contains the data, the code and the results of some analysis on stories published on wattpad.com. The corpus reflects the state of Wattpad as per January 2018, based on the sitemap files found on the server. Thus, it is not a complete dataset of all Wattpads' stories. More information about corpus building, analyses, and results can be found in the article: Pianzola, F., Rebora, S, an...


Loading files...



Recent Activity

Loading logs...

OSF does not support the use of Internet Explorer. For optimal performance, please switch to another browser.
This website relies on cookies to help provide a better user experience. By clicking Accept or continuing to use the site, you agree. For more information, see our Privacy Policy and information on cookie use.

Start managing your projects on the OSF today.

Free and easy to use, the Open Science Framework supports the entire research lifecycle: planning, execution, reporting, archiving, and discovery.