Main content

Home

Menu

Loading wiki pages...

View
Wiki Version:
This OSF project is associated with the following study, which is available on [PsyArXiv](https://osf.io/preprints/psyarxiv/rz8qn): - Sönning, Lukas & Jesse Egbert. 2024. *Sensitivity of dispersion measures to distributional patterns and corpus design.* PsyArXiv. https://doi.org/10.31234/osf.io/rz8qn The study was presented at *ICAME 2024* in Vigo, Spain. The **presentation slides** can be found [here](https://osf.io/47ued). This is the **abstract**: - *While the purpose of dispersion measures is to quantify how evenly an item (or structure) is distributed in a corpus, recent work has shown that indices also respond to other features in the data: Juilland’s D varies systematically with the number of texts (or corpus parts) underlying the analysis, and all commonly used measures respond systematically to the frequency of an item. The present study aims to provide further insights into the sensitivity (or fragility) of dispersion measures to aspects of corpus design and data distribution. Using a simulation study that mimics distributional settings observed in natural language data, we explore how measures respond to differences in corpus design (number of texts, average text length, distribution of text lengths) and distributional milieu (frequency and evenness of distribution). Our results suggest that, within the settings covered by our analysis, the factors frequency and evenness of distribution have roughly the same impact on the observed variability in scores, though there is some variation among measures. The average text length emerges as another feature that leaves its mark on the observed scores. Finally, we note that D2 exhibits the same weakness as D – it varies with the number of units (texts or corpus parts) that enter the analysis.* **Images** created for this study can be found in the folder "output/figures". They are published under a Creative Commons Attribution 4.0 licence (**CC BY 4.0**), which means that the licence terms for their use are quite generous (see http://creativecommons.org/licenses/by/4.0).
OSF does not support the use of Internet Explorer. For optimal performance, please switch to another browser.
Accept
This website relies on cookies to help provide a better user experience. By clicking Accept or continuing to use the site, you agree. For more information, see our Privacy Policy and information on cookie use.
Accept
×

Start managing your projects on the OSF today.

Free and easy to use, the Open Science Framework supports the entire research lifecycle: planning, execution, reporting, archiving, and discovery.