Date created: | Last Updated:
: DOI | ARK
Creating DOI. Please wait...
Category: Project
Description: Our corpus consists of internet texts from the IIA as well as excerpts from books written in Talian. Text processing is being done in R (R Core Team, 2023), and optical character recognition (OCR) is being carried out using Google’s Tesseract (Smith, 2007). As a starting point, we used trained data from Italian in Tesseract, and later checked for potential mismatches.
Add important information, links, or images here to describe your project.
Files can now be accessed and managed under the Files tab.