Main content
LASTU /
datasets
Date created: | Last Updated:
: DOI | ARK
Creating DOI. Please wait...
Category: Data
Description: Finnish datasets for LASTU. License: CC-BY-SA 4.0. Derived from Finnish Internet Parsebank: J. Luotolahti; J. Kanerva; V. Laippala; S. Pyysalo; F. Ginter. Towards Universal Web Parsebanks. Proceedings of the International Conference on Dependency Linguistics (Depling’15). 2015 https://aclanthology.org/W15-2124/ https://turkunlp.org/finnish_nlp.html When this database is used, it should be cited as Luotolahti et al. (2015). Filename schema: lang_source_tokens_minfreq.db, where - lang: the language (e.g., fi, es) - source: the data source (e.g., parsebank, tdt) - tokens: gross token amount (e.g., 50M, 2B) - minfreq: minimum frequency, or "full" if not applicable (e.g., 10)