Main content

Contributors:

Date created: 2022-07-26 02:20 PM | Last Updated: 2025-01-17 10:20 AM

Identifier: DOI 10.17605/OSF.IO/MC26T

Category: Project

Description: This OSF project accompanies a research paper that explores an approach to keyword analysis based on regression modeling. Specifically, we use a form of negative binomial regression, which offers a number of advantages compared to existing methods for identifying typical items in a target corpus. Thus, it is responsive to the multidimensional nature of keyness and can address multiple aspects of typicalness simultaneously, using a single statistical model. Further, metrics of interest can be enriched with confidence intervals, which allows us to isolate descriptive and inferential indicators of keyness. Finally, all quantities are based on a text-level analysis, which accounts for the fact that corpora consist of text files and adjusts statistical estimates accordingly. As an illustrative case study, we use data from COCA to identify key verbs in academic writing. To assess the performance of our method, we monitor the coverage rate of the 95% confidence intervals and observe that, for our analysis task, this model seems to be adequate for purposes of statistical inference. Due consideration is also given to the limitations of this procedure, and we conclude by outlining the kinds of keyness analyses for which count regression models may be a worthwhile approach.

License: CC-By Attribution 4.0 International

Has supplemental materials for Count regression models for keyness analysis on PsyArXiv

Wiki

The study was presented at ICAME43 in Cambridge, UK. The presentation slides can be found here: https://osf.io/9hbf3

The manuscript is available as a preprint on PsyArXiv (https://psyarxiv.com/25mwj/):

  • Sönning, Lukas. (in review). Count regression models for keyness analysis. PsyArXiv preprint.

Data used in the study have been (or will be) published on TROLLing. Since work using the first two da…

Files

Files can now be accessed and managed under the Files tab.

Citation

Tags

COCAcorpuscorpus designcorpus linguisticsCorpus of Contemporary American Englishcount regressiondispersiondispersion measuresfrequencykeynesskeyword analysiskeywordslexical dispersionmethodologynegative binomial regressiontext-level analysisvocabulary listword frequency listsword importance

Recent Activity

Unable to retrieve logs at this time. Please refresh the page or contact support@osf.io if the problem persists.

OSF does not support the use of Internet Explorer. For optimal performance, please switch to another browser.
Accept
This website relies on cookies to help provide a better user experience. By clicking Accept or continuing to use the site, you agree. For more information, see our Privacy Policy and information on cookie use.
Accept
×

Start managing your projects on the OSF today.

Free and easy to use, the Open Science Framework supports the entire research lifecycle: planning, execution, reporting, archiving, and discovery.