Main content

Date created: 2019-05-23 01:16 AM | Last Updated: 2024-04-30 05:05 PM

Identifier: DOI 10.17605/OSF.IO/5GXMN

Category: Data

Description: This is collection of all the stories' titles published on Wattpad at the date: January 2018. It's a corpus of around 30 millions titles in more than 50 different languages. It includes mainly original fiction and a small part of fan fiction (roughly 10%). The R Markdown files regarding the procedures for network analysis and sentiment analysis can be found in the GitHub repository: https://github.com/SimoneRebora/Wattpad_analysis We published an article based on this data https://doi.org/10.1371/journal.pone.0226708

License: CC-By Attribution 4.0 International

Wiki

This project contains the data, the code and the results of some analysis on stories published on wattpad.com.

The corpus reflects the state of Wattpad as per January 2018, based on the sitemap files found on the server. Thus, it is not a complete dataset of all Wattpads' stories. More information about corpus building, analyses, and results can be found in the article: Pianzola, F., Rebora, S, an…

Files

Files can now be accessed and managed under the Files tab.

Zotero

Loading citations...

Citation

Tags

booksclassicsDHdigital humanitiesdigital literary studiesempirical literary studiesfictionlanguage recognitionliterary modelingliteraturenarrativenarratologynatural language processingNLPnovelsreader responsereadersreadingsentiment analysissocial mediasocial readingstoriesteenagersWattpadworld literature

Recent Activity

Loading logs...

OSF does not support the use of Internet Explorer. For optimal performance, please switch to another browser.
Accept
This website relies on cookies to help provide a better user experience. By clicking Accept or continuing to use the site, you agree. For more information, see our Privacy Policy and information on cookie use.
Accept
×

Start managing your projects on the OSF today.

Free and easy to use, the Open Science Framework supports the entire research lifecycle: planning, execution, reporting, archiving, and discovery.