Main content
Integral Dataset (subset of Project Gutenberg)
Date created: | Last Updated:
: DOI | ARK
Creating DOI. Please wait...
Category: Data
Description: This project contains the Integral dataset, which is a corpus of pre-processed texts extracted from Project Gutenberg. It was split into a train set (4.4Gb), a validation set (1.1Gb) and a test set (1.1Gb) of text.