Main content

Home

Menu

Loading wiki pages...

View
Wiki Version:
# Replication Data for Jacobs G., Van Hee C., Hoste V. Automatic Classification of Participant Roles in Cyberbullying. Accompanying replication source code, data and metadata for the manuscript of Jacobs G., Van Hee C., Hoste V. Automatic Classification of Participant Roles in Cyberbullying. ## Contents & Source Code Repositories - Source code of Pre-trained language model experiments corresponding to section 5 in manuscript: https://github.com/GillesJ/transformers-cyberbullying-participants - Source code of Feature Engineering and Linear Classification experiments corresponding to section 4 in manuscript: https://github.com/GillesJ/linear-participants-role - Feature vectors for the above experiments are hosted here as they do not fit in a Github repository (cf. below). In this repository, we provide download links to the featurized dataset vector files for both languages, as well as some metadata files for indexing feature types and corpus document identifiers. ### English feature vector data - **EN_feature_vectors.svm.gz**: Feature vector file (3.6 GB, 9 GB decompressed) in SVMLight format, gzip compressed. Can be used directly in most ML libraries (e.g., scikit-learn loading function and LibSVM). Class labels: {0: "not bully", 1: "harasser", 2: "victim", 3: "defender"}; - **EN_devset_holdout_indices.json**: JSON holdout indices (538 KB) to split off the same heldin and holdout instance sets as in the paper experiments. Indexes the SVMLight file by row; - **EN_feature_map_dict.pkl**: Feature type mapping dictionary (34.3 MB) for indexing the SVMLight-file to their feature types (e.g., word 3-grams: column 0-14230). This file is a Python 2.7.12 serialized object using the standard pickle module; ### Dutch featue vector data - **NL_feature_vectors.svm.gz**: Feature vector file (2.6 GB, 6.1 GB decompressed) in SVMLight format, gzip compressed. Can be used directly in most ML libraries (e.g., scikit-learn loading function and LibSVM). Class labels: {0: "not bully", 1: "harasser", 2: "victim", 3: "defender"}; - **NL_devset_holdout_indices.json**: JSON holdout indices (799 KB) to split off the same heldin and holdout instance sets as in the paper experiments. Indexes the SVMLight file by row; - **NL_feature_map_dict.pkl**: Feature type mapping dictionary (32 MB) for indexing the SVMLight-file to their feature types (e.g., word 3-grams: column 0-14230). This file is a Python 2.7.12 serialized object using the standard pickle module; ## Overview of several inter-annotator metrics. Here we provide an Excel spreadsheet of several inter-annotator metrics of participant role labeled dataset. - **cyberbullying_participant_role_2018_interannotatorscores.xlsx**: Excel spreadsheet of several inter-annotator metrics (7 KB); ## Overview of all tested system results. Here we provide an Excel spreadsheet of all tested system results. We tested many machine learning pipeline configurations for every language. Due to space restrictions we could not represent all system results in the paper. - **cyberbullying_participant_role_2018_all_results.xlsx**: Excel spreadsheet of all results (17 KB); ## Contact. Please contact us for any and all questions regarding this research or the data provided. Contact information can be found at the following links: - [Gilles Jacobs][1] Copyright © 2020 Language and Translation Technology Team. All rights reserved. [1]: https://orcid.org/0000-0001-8846-3015
OSF does not support the use of Internet Explorer. For optimal performance, please switch to another browser.
Accept
This website relies on cookies to help provide a better user experience. By clicking Accept or continuing to use the site, you agree. For more information, see our Privacy Policy and information on cookie use.
Accept
×

Start managing your projects on the OSF today.

Free and easy to use, the Open Science Framework supports the entire research lifecycle: planning, execution, reporting, archiving, and discovery.