Date created: 2019-05-23 01:16 AM | Last Updated: 2024-04-30 05:05 PM

Identifier: DOI 10.17605/OSF.IO/5GXMN

Category: Data

Description: This is collection of all the stories' titles published on Wattpad at the date: January 2018. It's a corpus of around 30 millions titles in more than 50 different languages. It includes mainly original fiction and a small part of fan fiction (roughly 10%). The R Markdown files regarding the procedures for network analysis and sentiment analysis can be found in the GitHub repository: We published an article based on this data

License: CC-By Attribution 4.0 International


This project contains the data, the code and the results of some analysis on stories published on

The corpus reflects the state of Wattpad as per January 2018, based on the sitemap files found on the server. Thus, it is not a complete dataset of all Wattpads' stories. More information about corpus building, analyses, and results can be found in the article: Pianzola, F., Rebora, S, an…


booksclassicsDHdigital humanitiesdigital literary studiesempirical literary studiesfictionlanguage recognitionliterary modelingliteraturenarrativenarratologynatural language processingNLPnovelsreader responsereadersreadingsentiment analysissocial mediasocial readingstoriesteenagersWattpadworld literature

