# Replication Data for Fine-Grained Implicit Sentiment in Financial News: Uncovering Hidden Bulls and Bears.
This replication repository is split across several source code with data sub-repositories that containing source code and data for the experiments in *"Fine-Grained Implicit Sentiment in Financial News: Uncovering Hidden Bulls and Bears." by Gilles Jacobs and Véronique Hoste*.
## 1. Pre-processing with `sentivent-webanno-parser'
Contains scripts for parsing and pre-processing the original WebAnno export to the data formats required for experiments in paper at
https://github.com/GillesJ/sentivent_webannoparser
NOTE: The original SENTiVENT WebAnno export files will be released after the project ends (end of 2021).
Use:
- run `python parse_to_implicit_polar.py` to preprocess orginal WebAnno export for coarse-grained gold polar expression polarity classification experiments.
- Run `python parse_to_clause.py` to preprocess for coarse-grained clause-based experiments.
- Run `python parse_to_gts.py` to pre-process for fine-grained triplet BIO format.
- There are several other scripts regarding IAA and viz. that are self-documenting.
## 2. Coarse-grained experiments repository
Contains replication data in '/data' subfolder and hyperoptim. training, testing, validation source code. Several utility scripts for the pre-processing of lexicons are also included, but the lexicons cannot be included due to copyright issues.
https://github.com/GillesJ/sentivent-implicit-economic-sentiment
Use:
- Model training and tokenization code in `custom_model.py` and `custom_classification_model.py`, run hyperoptim search with `python hyperopt_model_train.py`
- `/data/`: contains coarse-grained datasets in json format.
- `/utils/`: various utility scripts for viz., lexicon pre-processing, and EDA.
## 3. Fine-grained triplet experiments repository
Contains replication data in '/data' subfolder and hyperoptim. training, testing, validation source code fo Triplet extraction. This repositoy is a fork of the original work on GTS by Wu et al. (2020) (make sure you are on `#sentivent` branch.
https://github.com/GillesJ/GTS/tree/sentivent
Use:
- Run `python src/hyperopt.py` to train hyperoptim search. Collect results in tables with `collect_`
## 4. Hyperparameter optimization search individual results
We used wandb.ai built-in hyperparameter sweep functionality with hyperband early stopping. The implementation is in each experiment series code-base. For each encoder model + lexicon feature group there is a different project page with runs. The webpages listed below allow inspection of individual runs:
- No lex.: No lexicon features.
- Econ.: Economic domain lexicons.
- All: General-domain and economic domain lexicon features).
### 4.1. Coarse-grained gold polar expressions
- RoBERTa-large: [No lex.](https://wandb.ai/gillesjacobs/senti-nolex-roberta-large), [Econ.](https://wandb.ai/gillesjacobs/senti-lexecon-roberta-large), [All](https://wandb.ai/gillesjacobs/senti-lexall-roberta-large)
- RoBERTa-base: [No lex.](https://wandb.ai/gillesjacobs/senti-nolex-roberta-base), [Econ.](https://wandb.ai/gillesjacobs/senti-lexecon-roberta-base), [All](https://wandb.ai/gillesjacobs/senti-lexall-roberta-base)
- BERT-large: [No lex.](https://wandb.ai/gillesjacobs/senti-nolex-bert-large-cased), [Econ.](https://wandb.ai/gillesjacobs/senti-lexecon-bert-large-cased), [All](https://wandb.ai/gillesjacobs/senti-lexall-bert-large-cased)
- BERT-base: [No lex.](https://wandb.ai/gillesjacobs/senti-nolex-bert-base-cased-fix), [Econ.](https://wandb.ai/gillesjacobs/senti-lexecon-bert-base-cased-fix), [All](https://wandb.ai/gillesjacobs/senti-lexall-bert-base-cased-fix)
- DeBERTa-base: [No lex.](https://wandb.ai/gillesjacobs/senti-nolex-microsoft-deberta-base), [Econ.](https://wandb.ai/gillesjacobs/senti-lexecon-microsoft-deberta-base), [All](https://wandb.ai/gillesjacobs/senti-lexall-microsoft-deberta-base)
- FinBERT-Finvocab: [No lex.](https://wandb.ai/gillesjacobs/senti-nolex-finbert-finvocab-uncased), [Econ.](https://wandb.ai/gillesjacobs/senti-lexecon-finbert-finvocab-uncased), [All](https://wandb.ai/gillesjacobs/senti-lexall-finbert-finvocab-uncased)
- FinBERT-TRC2+FP: [No lex.](https://wandb.ai/gillesjacobs/senti-nolex-ProsusAI-finbert), [Econ.](https://wandb.ai/gillesjacobs/senti-lexecon-ProsusAI-finbert), [All](https://wandb.ai/gillesjacobs/senti-lexall-ProsusAI-finbert)
### 4.2. Coarse-grained clause-based experiments
- BERT-large : [No lex.](https://wandb.ai/gillesjacobs/impliclaus-nolex-bert-large-cased), [Econ.](https://wandb.ai/gillesjacobs/impliclaus-lexecon-bert-large-cased), [All](https://wandb.ai/gillesjacobs/impliclaus-lexall-bert-large-cased)
- BERT-base: [No lex.](https://wandb.ai/gillesjacobs/impliclaus-nolex-bert-base-cased), [Econ.](https://wandb.ai/gillesjacobs/impliclaus-lexecon-bert-base-cased), [All](https://wandb.ai/gillesjacobs/impliclaus-lexall-bert-base-cased)
- FinBERT-FinVocab: [No lex.](https://wandb.ai/gillesjacobs/impliclaus-nolex-finbert-finvocab-uncased), [Econ.](https://wandb.ai/gillesjacobs/impliclaus-lexecon-finbert-finvocab-uncased), [All](https://wandb.ai/gillesjacobs/impliclaus-lexall-finbert-finvocab-uncased)
- RoBERTa-base: [No lex.](https://wandb.ai/gillesjacobs/impliclaus-nolex-roberta-base), [Econ.](https://wandb.ai/gillesjacobs/impliclaus-lexecon-roberta-base), [All](https://wandb.ai/gillesjacobs/impliclaus-lexall-roberta-base)
- DeBERTa-base: [No lex.](https://wandb.ai/gillesjacobs/impliclaus-nolex-microsoft-deberta-base), [Econ.](https://wandb.ai/gillesjacobs/impliclaus-lexecon-microsoft-deberta-base), [All](https://wandb.ai/gillesjacobs/impliclaus-lexall-microsoft-deberta-base)
- FinBERT-TRC2+FP: [No lex.](https://wandb.ai/gillesjacobs/impliclaus-nolex-ProsusAI-finbert), [Econ.](https://wandb.ai/gillesjacobs/impliclaus-lexecon-ProsusAI-finbert), [All](https://wandb.ai/gillesjacobs/impliclaus-lexall-ProsusAI-finbert)
- RoBERTa-large: [No lex.](https://wandb.ai/gillesjacobs/impliclaus-nolex-roberta-large), [Econ.](https://wandb.ai/gillesjacobs/impliclaus-lexecon-roberta-large), [All](https://wandb.ai/gillesjacobs/impliclaus-lexall-roberta-large)
### 4.3. Fine-grained triplet experiments
SENTiVENT (ours):
- [DeBERTa-base](https://wandb.ai/gillesjacobs/microsoft_deberta_base-triplet-sentivent)
- [FinBERT-TRC2+FP](https://wandb.ai/gillesjacobs/prosusai_finbert-triplet-sentivent)
- [BERT-large](https://wandb.ai/gillesjacobs/bert_large_cased-triplet-sentivent)
- [BERT-base](https://wandb.ai/gillesjacobs/bert_base_cased-triplet-sentivent)
- [RoBERTa-base](https://wandb.ai/gillesjacobs/roberta_base-triplet-sentivent)
- [RoBERTa-large](https://wandb.ai/gillesjacobs/roberta_large-triplet-sentivent)
Explicit Wu et al. (2020):
- [RoBERTa-base](https://wandb.ai/gillesjacobs/roberta_base-triplet-joinedsemeval)
- [RoBERTa-large](https://wandb.ai/gillesjacobs/roberta_large-triplet-joinedsemeval)