| Last Updated:
Creating DOI and ARK. Please wait...
Create DOI / ARK
Treating a result as newsworthy, i.e., publishable, because the p-value is less than 0.05 leads to overoptimistic expectations of replicability. The underlying cause of these overoptimistic expectations is Type M(agnitude) error (Gelman & Carlin, 2014): when underpowered studies yield significant results, the effect size estimates are invariably exaggerated and noisy. These effects get published, leading to an illusion that the reported findings are robust and replicable. For the first time in psycholinguistics, we demonstrate the adverse consequences of this statistical significance filter. We do this by carrying out direct replication attempts of published results from a recent paper. Six experiments (self-paced reading and eyetracking, 168 participants in total) show that the published (statistically significant) claims are so noisy that even non-significant results are fully compatible with them. We also demonstrate the stark contrast between these small-sample studies and a larger-sample study (100 participants); the latter yields much less noisy estimates but also a much smaller magnitude of the effect of interest. The small magnitude looks less compelling but is more realistic. We suggest that psycholinguistics (i) move its focus away from statistical significance, (ii) attend instead to the precision of their estimates, and (iii) carry out direct replications in order to demonstrate the existence of an effect.
CC0 1.0 Universal