In psychology the main strategy to obtain empirical effects is null-hypothesis significance testing (NHST). But recent attempts have failed to replicate allegedly “established” effects that counted as well-supported. Hence, NHST retains too many errors; for otherwise effects would replicate. This makes trusting even results that top-journals publish a difficulty. This article advocates the research program strategy (RPS) as superior to NHST. Employing both Frequentist and Bayesian tools, data-simulation shows that RPS’s six steps (from a discovery against a random model to statistically verifying a hypothesis) retain fewer errors than standard usages of NHST. Therefore, RPS-results deserve greater trust than NHST-results. Simulations moreover estimate the expectable proportion of errors among published results.
Where test-power is not known, NHST constitutes the first step of RPS. Here, probabilities serve to preliminarily discover an effect. If test-power is known, by contrast, then a substantial discovery may arise (step 2). Moving beyond discoveries, steps 3 to 6 concern the justification of hypotheses (falsification and verification). This presupposes likelihoods and demands high induction quality (test-power) for data to test hypotheses. We use Wald’s criterion (the ratio of test-power and significance level) to preliminarily or substantially falsify the H0 (steps 3, 4), and to preliminary verify the H1 (step 5). Finally, the H1 is substantially verified (step 6), if the ratio of likelihoods for the H1 and the H0 exceeds Wald’s criterion, while the maximum-likelihood-estimator of data lies close to the H1.
Please see also the corresponding web application [1].
[1]: https://antoniakrefeldschwalb.shinyapps.io/ResearchProgramStrategy/