This project contains the code for training and validating three graphical models — a Hidden Markov Model, a Maximum-Entropy Markov Model, and a Conditional Random Field — that are featured in the article ["Learning What’s in a Name with Graphical Models"](https://graphical-models.netlify.app).
The code has been tested with Python versions up to and including `3.10.13`. Required packages should be installed from the `requirements.txt` file.
All three models are trained and validated on the same dataset, [CoNLL-2003](https://aclanthology.org/W03-0419.pdf). It is not included in this repository, but can be downloaded either [directly](https://data.deepai.org/conll2003.zip) or [via Tensorflow](https://www.tensorflow.org/datasets/catalog/conll2003). After downloading, decompress the zip file and copy "train.txt" and "test.txt" to the `data/` directory.
The trained models are serialized (using the `dill` Python package) and stored in `pickles/`.
The Jupyter notebooks for each model (`hmm…`/`memm…`/`crf…`) have a `-train` or `-validate` suffix. The `-train` notebooks should be run first, the result of which will be dumped into `pickles/`. The `-validate` notebooks then load the corresponding models from `pickles/` and calculate relevant metrics, including accuracy, precision, and recall scores.
Metrics calculated in `-validate` notebooks can be confirmed to match with those represented in ["Learning What’s in a Name with Graphical Models"](https://graphical-models.netlify.app).