Main content
Advancing our understanding of dispersion measures in corpus research
Date created: | Last Updated:
: DOI | ARK
Creating DOI. Please wait...
Category: Project
Description: This OSF project is associated with a study on the use of dispersion measures in corpus-based work. The motivation for this study is the observation that while our inventory of dispersion measures is continuously growing, researchers find little guidance for the choice among them. We offer a survey of recent corpus-based work, which shows that dispersion is typically measured across the text files in a corpus. Systematic insights into the behavior of measures in such distributional settings are currently lacking, however. After a thorough discussion of six prominent indices, we investigate their behavior on relevant frequency distributions, which are modeled on actual corpus data. Our evaluation considers different distributional settings, i.e. various combinations of frequency and dispersion values. The primary focus is on the response of measures to relatively high and low sub-frequencies, i.e. texts in which the item or structure of interest is over- or underrepresented (if not absent). We develop a simple method for constructing sensitivity profiles, which allows us to draw instructive comparisons among measures. We observe that these profiles vary considerably across distributional settings. While D and DP appear to show the most balanced response contours, our findings suggest that much work remains to be done to understand the performance of measures on items with normalized frequencies below 100 pmw.
Add important information, links, or images here to describe your project.