Main content



Loading wiki pages...

Wiki Version:
This dataset is designed for comparing different algorithms for adapting a language model to the writing of a particular user. It contains the sent email messages of employees of Enron separated by user and in chronological order. We based our dataset on the [Enron Personalization Validation Set][1] released by Google and used in this [CHI 2015][2] paper by Fowler, et al. on language model personalization. In comparison to the original dataset, our dataset provides the exact normalized text used in our experiments. We have also provided related assets such as our word list and baseline n-gram models to help facilitate future comparisons. If you use this dataset, please cite our Interspeech 2023 paper: @inproceedings{adhikary_personalization, author = {Jiban Adhikary and Keith Vertanen}, title = {Language Model Personalization for Improved Touchscreen Typing}, booktitle = {Proceedings of the International Conference on Spoken Language Processing}, location = {Dublin, Ireland}, month = {August}, year = {2023}, } This material is based upon work supported by the NSF under Grant No. IIS-1750193 [1]: [2]:
OSF does not support the use of Internet Explorer. For optimal performance, please switch to another browser.
This website relies on cookies to help provide a better user experience. By clicking Accept or continuing to use the site, you agree. For more information, see our Privacy Policy and information on cookie use.

Start managing your projects on the OSF today.

Free and easy to use, the Open Science Framework supports the entire research lifecycle: planning, execution, reporting, archiving, and discovery.