This project contains materials for the July 18, 2023 workshop "Quantitative analysis for corpus phonetics and phonology", part
of the series [Unlaboratory Phonology: Corpus Approaches][1] sponsored by the [Association for Laboratory Phonology][2].
Much of the workshop makes reference to my book [*Regression Modeling for Linguistic Data*][3] (MIT Press: 2023), of which a preprint version is still available [here][4], with all data and code used in the book.
**Datasets** used in the workshop, from the book:
* `vot` (CC BY 4.0 license, described in *RMLD* 5.1.2/A.2)
* `french_cdi_24` (derived from [Wordbank][5], CC BY 4.0 license, described in *RMLD* 7.1.2/9.7.2)
* `turkish_if0` (CC BY 4.0 license, described in *RMLD* 10.1.2)
**Slides**: `corpusStatsTutorial_slides.pdf`
**Code**: `corpus_stats_tutorial_code.R`
**Topics**:
1. *Visualization*: some aspects which are especially relevant for corpus data
2. *Variable selection*: theoretical background, model comparison, choosing a set of predictors
3. *Unpacking results*: multi-level factors, post-hoc tests, interactions
4. *Mixed-effects models*: lesser-known uses of random effects, practical issues (e.g. convergence), model selection
I focus on some aspects of each topic which are particularly relevant for corpus data. None are intended to be comprehensive.
[1]: https://labphon.org/content/unlaboratory-phonology-corpus-approaches-summer-2023
[2]: https://labphon.org/
[3]: https://mitpress.mit.edu/9780262045483/regression-modeling-for-linguistic-data/
[4]: https://osf.io/3827m
[5]: http://wordbank.stanford.edu/faq