This project lists code and models for '*How do the kids speak? Improving educational use of text mining with child-directed language models*'.
In the paper, we found that text mining models trained on child-directed text - from Youtube, television shows, Simple English Wikipedia, and childrens' books:
1. performed better on automated originality scoring of children's creativity, and
2. exhibited lower gender and racial biases.
The models, in addition to being presented here, are also hosted on a server at:
https://openscoring.du.edu/data/all_weighted_10-12_100k.kv
https://openscoring.du.edu/data/all_weighted_10-12_100k.kv.vectors.npy
They are in the Gensim KeyedVectors format, and can be converted to other formats with that library. An example of online use of the models is in:
https://github.com/massivetexts/motes-corpus/blob/master/analysis/MOTESCorpusBiasAnalysisAndComparison.ipynb.
> Organisciak, P., Newman, M., Eby, D., Acar, S. and Dumas, D. (2023), "How do the kids speak? Improving educational use of text mining with child-directed language models", Information and Learning Sciences, https://doi.org/10.1108/ILS-06-2022-0082
If you have questions, email me at peter.organisciak@du.edu, and I'll try to add documentation as questions come up.