The Transparent Psi Project – Introduction and aims
---------------------------
**Introduction**
Recent reports on the lack of reproducibility of important psychological findings and growing evidence for a systematic positive bias in the published research reports is often interpreted as a ‘confidence crisis’ in psychological science. ‘Questionable research practices’ (QRPs, Steneck, 2006) and poor protocol delivery has been suggested to lie behind this calamity. Skepticism about the quality and trustworthiness of studies is on the rise in most areas of psychological science, and life sciences in general since the beginning of the confidence crisis. The increased apprehension in the field can lead to the early or unwarranted dismissal of research findings.
This issue is especially tangible in relation to studies testing controversial hypotheses or with findings contradictory to the status quo. In these cases, the notion that the finding was artificially generated by QRPs or poor execution of the protocol may seem more parsimonious than the proposed unconventional theory. However, such suspicions are very hard to test. With the exception of a few rare cases where there is actual evidence of QRPs, it is not possible to prove or disprove as-intended protocol delivery or that a particular study is free of QRPs.
One might argue that replications would negate this issue. That is, the trustworthiness of a finding can simply be tested by repeating the study, and seeing whether the finding would replicate. Unfortunately, the lack of established methods to verify study integrity hampers the credibility of replications just as well as that of original studies. QRPs and protocol deviations are partly outlets for generating experimenter expectancy effects. So the failure to replicate a particular finding could be also explained by bias introduced through QRPs and sloppy study execution. Thus, supporters and skeptics of a particular finding can easily find themselves in a deadlock.
Importantly, replication is a relatively costly method for checking credibility. It requires almost the same amount of resources, or more if we would like to adhere to recent replication standards, than the original study. This is not worth it if the results of the replication can be easily dismissed on ground of potential QRPs and protocol deviations, which cannot be refuted. It may be easier to simply disregard the controversial findings altogether, until they are proven in a ‘really good study’. Another common practice to dismiss failed replications by the supporters of the original findings is to post-hoc label them ‘conceptual replications’ that deviate in some minor detail from the original study that could have caused the discrepant results. More generally, post-hoc criticism of study design can make the whole research, be it an original study or a replication attempt, a waste of resources.
The risk of un-disprovable post-hoc accusations of QRPs or poor protocol delivery and the possibility of post-hoc criticism of study design decrease the relative value of conducting studies, this, together with the fact that we rely on resource-intensive replications to prove credibility, makes the research endeavour costly, and this relative cost is increasing together with the scepticism in the field. Therefore, there is a critical need for methodological advances that can cost-effectively increase the credibility and acceptability of original research, and we need to set up clear criteria on how to conduct trustworthy research, that can be adhered to in future scientific ventures.
First, the risk of post-hoc criticism could be decreased by peer review of the research plan before the start of the study. With the raise of registered reports, the risk for conducting a study with a fundamentally flawed methodology is somewhat decreased, since the researchers can get expert feedback about study design before initiating data collection. However, the registered reports are still constrained by the limitations of the peer-review system, for example, that only a handful of reviewer get to comment on the proposal, which offers limited protection from post-hoc criticism (especially interest-driven criticism) from others. These limitations could be overcome by expanding the concept of registered report peer review to a community-based peer-review of the research plan.
Second, developing methodologies that decrease or eliminate opportunities for QRPs and which make as-intended protocol delivery verifiable could be the solution to re-instate confidence in original research. Recently, there has been a rise in initiatives for transparency which could help reduce the prevalence of QRPs, such as encouragement of pre-registration of experiments by professional and governmental agencies, a call to publish research reports irrespective of their outcome, making reports available to a wider audience through self-archiving and open access publication, open publishing platforms, data repositories, and initiatives for large scale multi-lab replications. Furthermore, [best practice guidelines][1] have been set up to further improve credibility and integrity of research. However, inconsistencies in protocol delivery, result driven exclusion or imputation of data, and fraud are not prevented by these interventions and guidelines, because the study and data collection itself is still performed ‘in the dark’. So, to eliminate this last safe haven for QRPs and sloppy protocol execution, further innovations are in order.
**A pivotal case**
An area where tensions regarding trustworthiness are at an extreme is parapsychology. Recently, several publications reported positive results in support of ‘psi’ phenomena (Bem, 2011; Storm, Tressoldi, & Di Risio, 2010). Even though some of these report were published in high profile psychology journals, they generally met poor reception. For example, in his Experiment 1, Bem (2011) found a statistically significant higher than chance prediction rate of future randomly determined events. The authors interpreted this result as evidence supporting that humans have future telling (precognition) abilities. The interpretation that the results would be evidence for human extrasensory perception (ESP) was criticized on several accounts (Fiedler & Krueger, 2013; Rouder & Morey, 2011; Schwarzkopf, 2014). Those who offered counter-explanations for the positive findings usually mentioned some type of QRP, and problems with the execution of the studies (Wagenmakers, Wetzels, Borsboom, Kievit, & van der Maas, 2015; Wagenmakers, Wetzels, Borsboom, & van der Maas, 2011).
The scientific discourse surrounding these parapsychology studies played an important role in initiating the reformist movement in psychological science. So parapsychology experiments in general, and Bem's (2011) experiments in particular provide an excellent opportunity to test the impact and effectiveness of innovations enabling verifiable credibility. Thus, we chose to implement the credibility-enhancing methodologies devised by our group in a large scale replication of Bem’s (2011) Experiment 1, in order to test the effects of these interventions in a setting where they are expected to have a very high impact.
**Solutions**
*Solution to decrease the risk of post-hoc criticism*
One recognized way to increase confidence in the reliability of findings of ESP studies is to develop a study design jointly between psi proponent and skeptical laboratories (Schlitz, Wiseman, Watt, & Radin, 2006; Wagenmakers et al., 2015). The joint development of a study between proponents and opponents of a theory can be highly beneficial in improving the acceptability of the findings, because foreseeable methodological issues can be worked out before the execution of the study. Parties can also agree on the interpretation of different possible results in advance, thus making the conclusions of the study mutually acceptable. Credibility of research can also be improved this way by both parties making clear what kinds of assurances are required for a study to be considered credible and implementing them in the study design. Such joint studies are very rare at the moment and usually only involve a small number of researchers. However, if done on a large enough scale involving a significant portion of the stakeholders on the field, this approach could promote results and interpretations that are credible and acceptable for both proponents and opponents of the theory.
In Stage 1 of our project we have submitted our research plan to a **‘Consensus Design Process’**, a process involving a large number of stakeholders, all of whom have contributed to the debate surrounding Bem's (2011) experiments. These stakeholders were identified and approached systematically via a systematic review of the literature. The initial study protocol of the replication study was then peer reviewed by these stakeholders, and the study protocol was iteratively amended until a consensus was reached on the acceptability of the design using a ‘reactive-Delphi’ process (McKenna, 1994). The final review panel included 23 researchers with a roughly equal mix of proponents and opponents of the ESP-theory, among them the author of the original study. See details in another publication.
*Solution to eliminate opportunities for QRPs and make as-intended protocol delivery verifiable*
We implement the methodologies that promote verifiable credibility as applied in the case of Bem’s (2011) Experiment 1. Specifically, we use:
- Real-time born open data: as data is being collected, it is
immediately in real-time being pushed through to a publicly accessible database keeping an audit trail, ensuring data integrity.
- Consensus Design Process.
- Consensus-based pre-registration of conclusions for each possible outcome of the study related to the confirmatory hypothesis testing.
- Verifying as-intended protocol delivery through video recordings of scripted trial sessions.
- Validation of software codes by an independent party.
- Involving independent research auditors to verify study integrity.
- Recording any potential protocol deviations.
**Specific Aim**
In the replication study, our goal is to determine whether the higher than chance successful prediction rate of later randomly determined events found in Bem’s (2011) Experiment 1 (from here on the original study) can be replicated if we incorporate the above mentioned credibility improving techniques, thus, making study integrity verifiable.
[See our detailed research plan here][2]
**Funding**
The research program is funded by the Bial Foundation.
![Bial Foundation logo][3]
**References**
Bem, D. J. (2011). Feeling the future: experimental evidence for anomalous retroactive influences on cognition and affect. Journal of Personality and Social Psychology, 100(3), 407-425. doi:10.1037/a0021524
Fiedler, K., & Krueger, J. I. (2013). Afterthoughts on precognition: No cogent evidence for anomalous influences of consequent events on preceding cognition. Theory & Psychology, 23(3), 323-333. doi:10.1177/0959354313485504
McKenna, H. P. (1994). The Delphi technique: a worthwhile research approach for nursing? Journal of Advanced Nursing, 19(6), 1221-1225. doi:10.1111/j.1365-2648.1994.tb01207.x
Rouder, J. N., & Morey, R. D. (2011). A Bayes factor meta-analysis of Bem’s ESP claim. Psychonomic Bulletin & Review, 18(4), 682-689. doi:10.3758/s13423-011-0088-7
Schwarzkopf, D. S. (2014). We should have seen this coming. Frontiers in human neuroscience, 8(332). doi:10.3389/fnhum.2014.00332
Steneck, N. H. (2006). Fostering integrity in research: Definitions, current knowledge, and future directions. Science and engineering ethics, 12(1), 53-74.
Storm, L., Tressoldi, P. E., & Di Risio, L. (2010). Meta-analysis of free-response studies, 1992–2008: Assessing the noise reduction model in parapsychology. Psychological Bulletin, 136(4), 471-485. doi:http://dx.doi.org/10.1037/a0019457
Wagenmakers, E.-J., Wetzels, R., Borsboom, D., Kievit, R., & van der Maas, H. L. J. (2015). A skeptical eye on psi. In E. May & S. B. Marwaha (Eds.), Extrasensory Perception: Support, Skepticism, and Science (pp. 153-176). NB: ABC-CLIO.
Wagenmakers, E.-J., Wetzels, R., Borsboom, D., & van der Maas, H. L. (2011). Why psychologists must change the way they analyze their data: the case of psi: comment on Bem (2011). Journal of Personality and Social Psychology, 100(3), 426-432. doi:10.1037/a0022790
[1]: https://osf.io/jk2zf/wiki/Best%20practice%20guidelines%20to%20improve%20research%20integrity/
[2]: https://osf.io/sb2dt/wiki/00%20Executive%20Summary/
[3]: https://static1.squarespace.com/static/533155cae4b08b6d16d34ec4/t/555c9b76e4b0e543f2e387b2/1432132471283/?format=300w