Main content
Count regression models for keyness analysis
Date created: | Last Updated:
: DOI | ARK
Creating DOI. Please wait...
Category: Project
Description: This OSF project accompanies a research paper that explores an approach to keyword analysis based on regression modeling. Specifically, we use a form of negative binomial regression, which offers a number of advantages compared to existing methods for identifying typical items in a target corpus. Thus, it is responsive to the multidimensional nature of keyness and can address multiple aspects of typicalness simultaneously, using a single statistical model. Further, metrics of interest can be enriched with confidence intervals, which allows us to isolate descriptive and inferential indicators of keyness. Finally, all quantities are based on a text-level analysis, which accounts for the fact that corpora consist of text files and adjusts statistical estimates accordingly. As an illustrative case study, we use data from COCA to identify key verbs in academic writing. To assess the performance of our method, we monitor the coverage rate of the 95% confidence intervals and observe that, for our analysis task, this model seems to be adequate for purposes of statistical inference. Due consideration is also given to the limitations of this procedure, and we conclude by outlining the kinds of keyness analyses for which count regression models may be a worthwhile approach.
Add important information, links, or images here to describe your project.