Main content

Home

Menu

Loading wiki pages...

View
Wiki Version:
This OSF project is associated with the following working paper, which is available on [PsyArXiv](https://psyarxiv.com/ns4q9/): - Sönning, Lukas. 2023. *Advancing our understanding of dispersion measures in corpus research*. PsyArXiv. https://psyarxiv.com/ns4q9/ Here is the **abstract**: - *This paper offers a survey of recent corpus-based work, which shows that dispersion is typically measured across the text files in a corpus. Systematic insights into the behavior of measures in such distributional settings are currently lacking, however. After a thorough discussion of six prominent indices, we investigate their behavior on relevant frequency distributions, which are designed to mimic actual corpus data. Our evaluation considers different distributional settings, i.e. various combinations of frequency and dispersion values. The primary focus is on the response of measures to relatively high and low sub-frequencies, i.e. texts in which the item or structure of interest is over- or underrepresented (if not absent). We develop a simple method for constructing sensitivity profiles, which allow us to draw instructive comparisons among measures. We observe that these profiles vary considerably across distributional settings. While D, DA and DP appear to show the most balanced response contours, our findings suggest that much work remains to be done to understand the performance of measures on items with normalized frequencies below 100 per million words.* For the documentation of the analyses in the paper, we tried to follow the **TIER protocol 4.0** (https://www.projecttier.org/tier-protocol/). The file **00ReadMe.pdf** gives instructions for reproducing the analyses. All **R scripts** (see folder "scripts") should be commented in sufficient detail, and available both as a Quarto (RMarkdown) file and as an html file. The html files need to be downloaded and then opened in a web browser. **Images** created for this study can be found in the folder "output/figures". They are published under a Creative Commons Attribution 4.0 licence (**CC BY 4.0**), which means that the licence terms for their use are quite generous (see http://creativecommons.org/licenses/by/4.0).
OSF does not support the use of Internet Explorer. For optimal performance, please switch to another browser.
Accept
This website relies on cookies to help provide a better user experience. By clicking Accept or continuing to use the site, you agree. For more information, see our Privacy Policy and information on cookie use.
Accept
×

Start managing your projects on the OSF today.

Free and easy to use, the Open Science Framework supports the entire research lifecycle: planning, execution, reporting, archiving, and discovery.