[Video of talk given at ISBA 2018, Edinburgh](https://media.ed.ac.uk/media/1_9w1iellx) It is well-known in statistics (e.g., Gelman & Carlin, 2014) that treating a result as publishable just because the p-value is less than 0.05 leads to overop- timistic expectations of replicability. These overoptimistic expectations arise due to Type M(agnitude) error: when underpowered studies yield significant results, effect size estimates are guaranteed to be exaggerated and noisy. These effects get published, leading to an overconfident belief in replicability. We demonstrate the adverse consequences of this statistical significance filter by conducting seven direct replication attempts (268 participants in total) of a recent paper (Levy & Keller, 2013). We show that the published claims are so noisy that even non-significant results are fully compatible with them. We also demonstrate the contrast between such small-sample studies and a larger-sample study; the latter generally yields a less noisy estimate but also a smaller effect magnitude, which looks less compelling but is more realistic. We reiterate several suggestions from the methodology literature for improving best practices.
OSF does not support the use of Internet Explorer. For optimal performance, please switch to another browser.