Main content



Loading wiki pages...

Wiki Version:
This is the code to replicate the analyses presented in 'Polysemy through the lens of psycholinguistic variables: a dataset and an evaluation of static and contextualized language models' submitted for peer review at *SEM 2024. The original dataset collected and presented in the paper is available under a CC-BY Creative Commons Attribution 4.0 International license in the folder 'dataset', containing one file with the selected phrases ('phrases.txt') and one with the raw ratings ('raw_dataset.tsv') Plots and results are already available in the corresponding folders ('results', 'plots', 'distributions_plots'). Vectors for the polysemous words are contained in the folder 'vectors'. It is enough to run in numerical sequence (i.e. 01, 02, 03 etc) the .py files in the main folder to replicate the analyses reported in the paper. Scripts for previous stages of the pipeline, reported in the paper, are equally made available, in the 'preparation_scripts' folder. Notice that, to extract vectors from language models, it is required to have the original models available in a 'models' folder; to re-implement from scratch the count-pmi model, the PUKWac dataset is required (however, to avoid this, pre-counted frequencies and co-occurrences from PUKWac are made available in the 'pickles' folder).
OSF does not support the use of Internet Explorer. For optimal performance, please switch to another browser.
This website relies on cookies to help provide a better user experience. By clicking Accept or continuing to use the site, you agree. For more information, see our Privacy Policy and information on cookie use.

Start managing your projects on the OSF today.

Free and easy to use, the Open Science Framework supports the entire research lifecycle: planning, execution, reporting, archiving, and discovery.