Main content
Talian corpus: a written corpus of Brazilian Veneto
Date created: | Last Updated:
: DOI | ARK
Creating DOI. Please wait...
Category: Project
Description: Our corpus consists of internet texts from the IIA as well as excerpts from books written in Talian. Text processing is being done in R (R Core Team, 2023), and optical character recognition (OCR) is being carried out using Google’s Tesseract (Smith, 2007). As a starting point, we used trained data from Italian in Tesseract, and later checked for potential mismatches.
Add important information, links, or images here to describe your project.