Main content

Home

Menu

Loading wiki pages...

View
Wiki Version:
This page contains data to accompany the ACL 2023 paper, "A Weakly Supervised Classifier and Dataset of White Supremacist Language" by Yoder et al. ([preprint here][1]). ## Access the data We vet access to the white supremacist data since it is sensitive (offensive and hateful). To request access to the data, please fill out this Google Form: https://forms.gle/ogSjjuY3NivtyRAY7 If approved, you will receive an email with further instructions to access the data. Antiracist and "neutral" data used in the paper as counterexamples to the white supremacist data are available without a vetting process. See the [Antiracist and neutral datasets component][2]. If you have any questions, email Michael Miller Yoder at `yoder@cs.cmu.edu`. ## White supremacist data sources The white supremacist data presented here is sampled from many sources. Please see the ACL 2023 paper for details. Along with data dumps from the Internet Archive and other sources, we sample data from the following papers: - Alatawi, H. S., Alhothali, A. M., & Moria, K. M. (2021). Detecting White Supremacist Hate Speech Using Domain Specific Word Embedding with Deep Learning and BERT. *IEEE Access*, 9, 106363–106374. https://doi.org/10.1109/ACCESS.2021.3100435 - Calderón, F. H., Balani, N., Taylor, J., Peignon, M., Huang, Y.-H., & Chen, Y.-S. (2021). Linguistic Patterns for Code Word Resilient Hate Speech Identification. *Sensors*, 21(23), 7859. https://doi.org/10.3390/s21237859 - ElSherief, M., Ziems, C., Muchlinski, D., Anupindi, V., Seybolt, J., De Choudhury, M., & Yang, D. (2021). Latent Hatred: A Benchmark for Understanding Implicit Hate Speech. *Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing*, 345–363. https://aclanthology.org/2021.emnlp-main.29/ - Jokubauskaitė, E., & Peeters, S. (2020). Generally Curious: Thematically Distinct Datasets of General Threads on 4chan/pol/. *Proceedings of the International AAAI Conference on Web and Social Media*, 14, 863–867. https://ojs.aaai.org/index.php/ICWSM/article/view/7351 - Papasavva, A., Zannettou, S., Cristofaro, E. D., Stringhini, G., & Blackburn, J. (2020). Raiders of the Lost Kek: 3.5 Years of Augmented 4chan Posts from the Politically Incorrect Board. *Proceedings of the International AAAI Conference on Web and Social Media*, 14, 885–894. https://ojs.aaai.org/index.php/ICWSM/article/view/7354 - Qian, J., ElSherief, M., Belding, E., & Wang, W. Y. (2018). Hierarchical CVAE for Fine-Grained Hate Speech Classification. *Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing*, 3550–3559. https://aclanthology.org/D18-1391.pdf [1]: https://arxiv.org/abs/2306.15732 [2]: https://osf.io/5xsjr/
OSF does not support the use of Internet Explorer. For optimal performance, please switch to another browser.
Accept
This website relies on cookies to help provide a better user experience. By clicking Accept or continuing to use the site, you agree. For more information, see our Privacy Policy and information on cookie use.
Accept
×

Start managing your projects on the OSF today.

Free and easy to use, the Open Science Framework supports the entire research lifecycle: planning, execution, reporting, archiving, and discovery.