Main content

Home

Menu

Loading wiki pages...

View
Wiki Version:
This project contains the code for training and validating three graphical models — a Hidden Markov Model, a Maximum-Entropy Markov Model, and a Conditional Random Field — that are featured in the article ["Learning What’s in a Name with Graphical Models"](https://graphical-models.netlify.app). The code has been tested with Python versions up to and including `3.10.13`. Required packages should be installed from the `requirements.txt` file. All three models are trained and validated on the same dataset, [CoNLL-2003](https://aclanthology.org/W03-0419.pdf). It is not included in this repository, but can be downloaded either [directly](https://data.deepai.org/conll2003.zip) or [via Tensorflow](https://www.tensorflow.org/datasets/catalog/conll2003). After downloading, decompress the zip file and copy "train.txt" and "test.txt" to the `data/` directory. The trained models are serialized (using the `dill` Python package) and stored in `pickles/`. The Jupyter notebooks for each model (`hmm…`/`memm…`/`crf…`) have a `-train` or `-validate` suffix. The `-train` notebooks should be run first, the result of which will be dumped into `pickles/`. The `-validate` notebooks then load the corresponding models from `pickles/` and calculate relevant metrics, including accuracy, precision, and recall scores. Metrics calculated in `-validate` notebooks can be confirmed to match with those represented in ["Learning What’s in a Name with Graphical Models"](https://graphical-models.netlify.app).
OSF does not support the use of Internet Explorer. For optimal performance, please switch to another browser.
Accept
This website relies on cookies to help provide a better user experience. By clicking Accept or continuing to use the site, you agree. For more information, see our Privacy Policy and information on cookie use.
Accept
×

Start managing your projects on the OSF today.

Free and easy to use, the Open Science Framework supports the entire research lifecycle: planning, execution, reporting, archiving, and discovery.