This repository contains datasets collected for our study "Funny Accents: Exploring Genuine Interest in Internationalized Domain Names".
Currently we publish the following datasets publicly:
* `lgr.tar.gz`: [Label Generation Rulesets](https://www.iana.org/help/idn-repository-procedure) for a large set of TLDs
* `titles.tar.gz`: web page titles of the root pages in the [Tranco top million of 29 August 2018](https://tranco-list.eu/list/RQ4M/1000000)
If you are interested in other data from our study, please [contact us](mailto:victor.lepochat@cs.kuleuven.be).
If you use these datasets in your research, please cite [our paper](https://link.springer.com/chapter/10.1007%2F978-3-030-15986-3_12):
> Le Pochat, Victor; Van Goethem, Tom; Joosen, Wouter (2019) Funny Accents: Exploring Genuine Interest in Internationalized Domain Names. In 20th Passive and Active Measurement Conference. PAM 2019. Lecture Notes in Computer Science, vol 11419. Springer, Cham. DOI: 10.1007/978-3-030-15986-3_12