Count regression models for keyness analysis

doi:10.17605/OSF.IO/MC26T

Title	Authors

Home

The study was presented at ICAME43 in Cambridge, UK. The **presentation slides** can be found here: https://osf.io/9hbf3 The manuscript is available as a **preprint** on *PsyArXiv* (https://psyarxiv.com/25mwj/): - Sönning, Lukas. (in review). Count regression models for keyness analysis. *PsyArXiv preprint*. **Data** used in the study have been (or will be) published on TROLLing. Since work using the first two datasets is in review, only anonymized versions of these can currently be accessed (use second link): - Sönning, Lukas. 2022. Key verbs in academic writing: Dataset for "Evaluation of keyness metrics: Reliability and interpretability", https://doi.org/10.18710/EUXSMW, DataverseNO, DRAFT VERSION. [An anonymized version of the dataset is available at https://dataverse.no/privateurl.xhtml?token=757450e1-4ff4-4c7d-a3df-d90453a61e39] - Sönning, Lukas. 2022. Biber et al.'s (2016) set of 150 BNC items for the analysis of dispersion measures: Dataset for "Evaluation of text-level measures of lexical dispersion", https://doi.org/10.18710/MNVB36, DataverseNO, DRAFT VERSION. [An anonymized version of the dataset is available at https://dataverse.no/privateurl.xhtml?token=a25d30a0-6067-4989-837a-19468c9fa661.] - Sönning, Lukas, and Manfred Krug. 2021. Actually in Contemporary British Speech: Data from the Spoken BNC Corpora. https://doi.org/10.18710/A3SATC. DataverseNO, V1. **Images** created for this study can be found in the folder "figures". They are published under a Creative Commons Attribution 4.0 licence (**CC BY 4.0**), which means that the licence terms for their use are quite generous (see http://creativecommons.org/licenses/by/4.0). The folder **R scripts** contains a number of files: - **keyness_regression_illustration**: A short **tutorial** showing how to run negative binomial regression models using the R package "gamlss" (https://cran.r-project.org/web/packages/gamlss/index.html), in the form of an html file (https://osf.io/tqchs) or an RMarkdown script (https://osf.io/kmyrd). - **coca_data_preparation**: The preparation of the data set (addition of text metadata, selection of verb lemmas for analysis, implementation of randomization scheme), both as an html file (https://osf.io/2d8t4) and as an RMarkdown script (https://osf.io/vdx53). - **script_paper_keyness_regression**: R code for reproducing the analyses and figures in the manuscript, again in two versions: html () or RMarkdown (). - **keyness_computations**: An RMarkdown file describing the simulation study that was run to assess the coverage properties of the procedure.

OSF does not support the use of Internet Explorer. For optimal performance, please switch to another browser.

This website relies on cookies to help provide a better user experience. By clicking Accept or continuing to use the site, you agree. For more information, see our Privacy Policy and information on cookie use.

Start managing your projects on the OSF today.

Free and easy to use, the Open Science Framework supports the entire research lifecycle: planning, execution, reporting, archiving, and discovery.

Create an Account Learn More Hide this message

Main content

Home

Menu

Start managing your projects on the OSF today.

Main content

Links to this project

Home

Menu

Add new wiki page

Page permissions have changed

Wiki page deleted

Connected to the collaborative wiki

Connecting to the collaborative wiki

Collaborative wiki is unavailable

Browser unsupported

Start managing your projects on the OSF today.