Main content

Home

Menu

Loading wiki pages...

View
Wiki Version:
# SentiFM company-specific economic news event dataset (English) This repository contains: 1) the original Brat annotation files of the SentiFM company-specific event dataset in English economic news. 2) Replication data for "Gilles Jacobs, Els Lefever, and Véronique Hoste. 2018. Economic event detection in company-specific news text. In Proceedings of the 1st Workshop on Economics and NLP (ECONLP). ACL 2018, Melbourne, AUS, 1-10." The SentiFM English dataset enables supervised event detection for 10 economic event types. For a description of the English dataset and baseline experiments see "Jacobs et al. 2018. Economic event detection in company-specific news text. In Proceedings of the 1st Workshop on Economics and NLP (ECONLP). ACL 2018, Melbourne, AUS, 1-10.". (An overview will be provided here later.) A Dutch counterpart will be released for crosslingual methods. Annotation are token-level and -unlike ACE/ERE event mentions- are usually multi-token spans. Events are linked to the earliest preceding company mentions with an ‘about\_company’ relation (this relation is duplexed into ‘acquiring\_company’ and ‘target\_company’ for MergerAcquisition events). When using this dataset please use the following reference: Gilles Jacobs, Els Lefever, and Véronique Hoste. 2018. Economic event detection in company-specific news text. In Proceedings of the 1st Workshop on Economics and NLP (ECONLP). ACL 2018, Melbourne, AUS, 1-10. Experiment source code: https://github.com/GillesJ/sentivent-economic-event-detection # 1. SentiFM English company-specific economic event dataset - **bratannotationfiles.tar.gz/**: [Brat annotation tool][1] text .txt and .ann annotation files with token-level labels and relations. For more info see annotation_guidelines. These files can be imported into Brat. These were created with Brat v1.2 but the data should be backwards compatible with later versions and WebAnno. Files are gzip compressed in a tarball. - **bratconf/**: Contains Brat configuration files of annotation scheme and visual styling. Needed for Brat import. - **datacollection.txt**: More info on how and where the data was collected and which keywords were used. - **Annotation_guidelines_EN.txt**: Annotation guidelines used by annotators. Contains info on types, subtypes, and relations. # 2. Experiment Replication Data: Jacobs G., et al. 2018. "Economic Event Detection in Company-Specific News Text". - **replicationdata/experiment_data.json**: Data instance sentences in holdin-holdout split with event type labels. - **replicationdata/SVM_train_test_feature_vector_per_type**: Folder containing feature vector files per event type in train-test splits used in the SVM-based experiments. SVMLight format compatible with sklearn and LIBSVM. - **replicationdata/RNNLSTM_train_test_features.json**: feature data as used in RNNLSTM experiments. Contains word index ids and pre-processed tokenized sentences. JSON format encoded using numpy formatter with the jsonpickle package. # Contact For questions: - Gilles Jacobs: https://orcid.org/0000-0001-8846-3015 [1]: http://brat.nlplab.org/
OSF does not support the use of Internet Explorer. For optimal performance, please switch to another browser.
Accept
This website relies on cookies to help provide a better user experience. By clicking Accept or continuing to use the site, you agree. For more information, see our Privacy Policy and information on cookie use.
Accept
×

Start managing your projects on the OSF today.

Free and easy to use, the Open Science Framework supports the entire research lifecycle: planning, execution, reporting, archiving, and discovery.