It is common practice in the statistical analysis of phonetic data to draw conclusions on the basis of statistical significance, often judged by the size of a p-value. While p-values reflect the probability of incorrectly concluding a null effect is real, they do not provide information about other types of error that are also important for interpreting statistical results. In particular, it is possible to fail to detect a true effect, to exaggerate the magnitude of an effect, or even to incorrectly estimate an effect's direction, resulting in erroneous and biased measures of effect size. In this technical report, we focus on three measures related to these errors. The first, power, reflects the failure to detect an effect that in fact exists. The second and third, Type M and Type S errors, measure the extent to which estimates of the magnitude and direction of an effect are inaccurate. We then provide 'design analysis' (Gelman & Carlin, 2014), using data from an experimental study on German incomplete neutralization, to illustrate how power, magnitude, and sign errors vary with sample and effect size. This case study shows how the informativity of research findings can vary substantially in ways that are not always, or even usually, apparent on the basis of a p-value alone. We conclude by repeating three recommendations for good statistical practice in phonetics from best practices widely recommended for the social and behavioral sciences: report all results; design studies which will produce high-precision estimates; and conduct direct replications of previous findings.
This website relies on cookies to help provide a better user experience. By clicking Accept or continuing to use the site, you agree. For more information,
and information on cookie use.