Main content

Date created: | Last Updated:


Creating DOI. Please wait...

Create DOI

Category: Project

Description: Our corpus consists of internet texts from the IIA as well as excerpts from books written in Talian. Text processing is being done in R (R Core Team, 2023), and optical character recognition (OCR) is being carried out using Google’s Tesseract (Smith, 2007). As a starting point, we used trained data from Italian in Tesseract, and later checked for potential mismatches.

License: CC-By Attribution 4.0 International


Add important information, links, or images here to describe your project.


Loading files...



Recent Activity

Loading logs...

OSF does not support the use of Internet Explorer. For optimal performance, please switch to another browser.
This website relies on cookies to help provide a better user experience. By clicking Accept or continuing to use the site, you agree. For more information, see our Privacy Policy and information on cookie use.

Start managing your projects on the OSF today.

Free and easy to use, the Open Science Framework supports the entire research lifecycle: planning, execution, reporting, archiving, and discovery.