**Stage 1, Consensus Design Process**
**Problem statement**
Recent reports on the lack of reproducibility of important psychological findings and growing evidence for a systematic positive bias in the published research reports is often interpreted as a ‘confidence crisis’ in psychological science. One of the factors that has been suggested to lie behind this calamity is a collection of ‘questionable research practices’ (QRPs, Steneck, 2006). Much debated research areas, such as parapsychology, suffer even more of the burden of the confidence crisis, because QRPs offer a convenient and parsimonious explanation for anomalous findings, especially if they do not fit the status quo theories.
Recently, several publications reported positive results in support of ‘psi’ phenomena in high profile psychology journals (Bem, 2011; Storm, Tressoldi, & Di Risio, 2010) Nevertheless, these reports met poor reception. The interpretation that the results would be evidence for extrasensory perception (ESP) was criticized on several accounts (Fiedler & Krueger, 2013; Rouder & Morey, 2011; Schwarzkopf, 2014), and those who offered a counter-explanation for the positive findings usually mentioned some type of QRP, and problems with the execution of the studies (Wagenmakers, Wetzels, Borsboom, Kievit, & van der Maas, 2015; Wagenmakers, Wetzels, Borsboom, & van der Maas, 2011). Although some studies indicate that controlling for the estimated effect of QRPs cannot account for all of the effects show in recent meta-analyses (Bierman, Spottiswoode & Bijl, 2016)
Such low level of confidence in the quality of studies can lead to the unwarranted dismissal of good research findings. This problem is prominent in ESP research, but it can be easily generalized to other controversial research areas and to science in general. Low credibility hurts science because researchers have to spend valuable resources on verifying published findings instead of being able to accept them as objective observations, or they may choose to disregard reports altogether because of unverifiable credibility, making the original study a waste of resources. Therefore, there is a critical need for methodological approaches that can increase the credibility and acceptability of research, and we need to set up clear criteria for credibility that can be adhered to in future scientific ventures.
**Solution**
One recognized way to increase confidence in the reliability of findings of ESP studies is to develop a study design jointly between psi proponent and skeptical laboratories (Schlitz, Wiseman, Watt, & Radin, 2006; Wagenmakers et al., 2015). The joint development of a study between proponents and opponents of a theory can be highly beneficial in improving the acceptability of the findings, because foreseeable methodological issues can be worked out before the execution of the study. Parties can also agree on the interpretation of different possible results in advance, thus making the conclusions of the study mutually acceptable. Credibility of research can also be improved this way by both parties making clear what kinds of assurances are required for a study to be considered credible and implementing them in the study design.
Such joint studies are very rare at the moment and usually only involve a small number of researchers. However, if done on a large enough scale involving a significant portion of the stakeholders on the field, this approach could promote results and interpretations that are credible and acceptable for both proponents and opponents of the theory. We call this process **‘Consensus Design’**.
Large scale parallel replications also hold a great potential for increasing acceptability of research findings. See e.g. Bierman & Jolij (2016).
**Aims**
Our aim in this study is to develop a consensus design for a replication study of Bem’s (2011) experiment 1 that is **mutually acceptable** for both psi proponent and mainstream researchers (psi sceptics), containing clear criteria for credibility.
**Methods**
To ascertain acceptability to all stakeholders on the field, the protocol of the replication study will be peer reviewed by a large number of psi proponent and mainstream researchers. The study protocol will be changed until a sufficiently high level of acceptability is reached. To achieve this, we will utilize a ‘reactive-Delphi’ process (McKenna, 1994).
The Delphi method involves the presentation of a survey or interview to a panel of ‘informed individuals’ in the field of the study with the goal of seeking their opinion on the issue of interest. After they respond, a new survey is designed based on the information gained from these responses. The second survey is returned to the participants who are asked to reconsider their responses or respond to new questions based on the results of the first round of the survey. This iterated process usually results in a convergence of judgements (Murphy et al., 1998). The procedure is repeated until consensus is reached among the participants or another stopping rule is triggered. For an overview of this technique see for example McKenna (1994). Additionally, in the reactive Delphi process the respondents are asked to react to previously presented information in a certain formalized way. We will use this methodology to seek revision suggestions and acceptability ratings from our respondents.
**Participants**
We will conduct a literature search to identify stakeholders on the field. Participant candidates will be asked to forward the study invitation to researchers who they think might contribute to the design of the replication study and who are eligible for participation.
Author of scientific publications contributing to the debate surrounding Bem and colleagues’ 2011 and 2015 papers (Bem, 2011; Bem, Tressoldi, Rabeyron, & Duggan, 2015) and closely related topics such as the replicability of psi research, meta-analyses of psi research findings, and methodologies of precognition studies will be eligible for participation. [Eligibility criteria are listed in more detail on this link.][1]
Participant candidates will be contacted via e-mail containing information about the study and asking to follow a link if they are interested in participating in the Delphi survey. It will be specified in the invitation that participants will not be granted authorship in the resulting publications. The link will lead to an initial website where participants will have the opportunity to indicate their eligibility by indicating a link or reference to a relevant publication and total number of their peer reviewed publications.
The publication nominating the researcher as a stakeholder, and other recent publications from the same author will be reviewed to determine whether the participant should be considered a psi proponent researcher or a mainstream researcher.
We anticipate that about 100 researchers will be contacted of whom approximately 30-40 will agree to participate and will be eligible. Our goal is to have at least 10 stakeholders represent both sides at study start (but see stopping rules below for details about minimum sample size). Prior studies indicate that a sample size of 12-20 or larger is sufficient in most cases to produce a very reliable and stable judgment in consensus studies (Jorm, 2015; Murphy et al., 1998).
**Procedure**
After the conclusion of recruitment, we will send a link to participants pointing to the online survey package. The survey package will consist of an online survey and links to the following supporting materials:
- description of the Bem (2011) original study,
- a detailed research protocol of the planned replication study with detailed rationale for any planned deviation from the original design,
- background material on the statistical methods intended to be used in the replication study.
The online survey will ask for:
- rating of perceived level of methodological quality of the protocol (on a 0-9 Likert scale),
- rating of security against QRPs (on a 0-9 Likert scale),
- free text response about what is the appropriate level of statistical confidence to seek in the replication study to draw our [pre-defined conclusions][2].
- free text response on how to improve study design
- free text response on how to increase secureness against QRPs
Once responses are collected, a result summary will be completed in order to provide feedback for the panel members. The result summary will contain descriptive summary statistics about the numerical ratings (median, range, interquartile range) and possibly for the appropriate level of statistical confidence to seek in the study. Studies on the role of feedback in consensus methods show that it is beneficial in terms of reaching consensus if not only numerical ratings are fed back to the participants, but also reasons for ratings (Gowan & McNichols, 1993; Woudenberg, 1991). Thus, free text responses will also be made available in the result summary. After the removal of any identifying information, these free text responses will be either added to the summary as is, or in cases of long responses, will be represented by an excerpt of the response containing all main points.
The proposed replication study protocol will be amended based on the results of the survey. Changes to the protocol will be tracked and annotated to clearly indicate the rationale for the amendments.
Subsequently, a new survey will be sent out to the participants. The survey will be identical to the one completed in the previous round. The following supporting material will accompany the survey accessible through external link:
- results summary of the previous survey
- reminder of the participant’s ratings in the previous round
- amended research protocol of the planned replication study with tracked changes and detailed rationale for changes,
- Bem (2011) manuscript containing the original study,
- additional material to support protocol amendments (if necessary)
Responses from the survey will be collected and analyzed, the study protocol amended in the same way as following the first survey round. This procedure will be repeated until one of the stopping rules is triggered (see below).
At the end of the survey, participants will be asked whether they agree to be acknowledged by name in the publications of the results of the Delphi survey.
The above procedure has been pilot tested before the initiation of the actual consensus design process.
**Timeframe**
The initial recruitment phase is expected to last 2-4 weeks. We will stop recruitment and commence with the first stage of the survey if at least 15 participants represent both the mainstream researchers and psi proponent researchers at the end of the second week of recruitment, or if at least 10 participants represent both sides at the end of the fourth week. If these conditions are not met, we will continue accrual until the end of week 9, all avenues to contact potential participants.
Participants will get 3 weeks to respond to the survey (with weekly reminders), after which 1-2 weeks will be necessary to complete the material for the next survey round and gather any lagging responses if necessary.
Considering the stopping rules (see below), this study will last at most for 28 weeks.
**Dropouts**
Participants who respond to the first round of the survey will get the material for all subsequent survey rounds. Participants who don’t respond to the first round of the survey will not get invited to subsequent survey rounds.
**End of the consensus process**
The consensus process will be either:
1) concluded with unsuccessful recruitment,
2) concluded with consensus, or
3) concluded with no consensus,
depending on which of our [stopping rules][3] will be triggered.
**Results**
The main result of the current Delphi study will be a study protocol that is revised during the survey rounds based on the feedback of the participants. This finalized study protocol will be carried over to the pilot test stage of the replication study. Another important output of the current study is the process of how to reach a consensual study design.
**Funding**
The research program is funded by the Bial Foundation.
![Bial Foundation logo][4]
*References*
Bem, D. J. (2011). Feeling the future: experimental evidence for anomalous retroactive influences on cognition and affect. Journal of Personality and Social Psychology, 100(3), 407-425. doi:10.1037/a0021524
Bem, D. J., Tressoldi, P., Rabeyron, T., & Duggan, M. (2015). Feeling the future: A meta-analysis of 90 experiments on the anomalous anticipation of random future events. F1000Research, 4.
Bierman, D.J. & Jolij, J. (2016) Large scale collaborative and adversarial replication projects of controversial scientific findings. Presented at the PA Convention, Boulder, Co, USA.
Bierman, D. J., Spottiswoode, J. P., & Bijl, A. (2016). Testing for Questionable Research Practices in a Meta-Analysis: An Example from Experimental Parapsychology. PloS one, 11(5), e0153049.
Fiedler, K., & Krueger, J. I. (2013). Afterthoughts on precognition: No cogent evidence for anomalous influences of consequent events on preceding cognition. Theory & Psychology, 23(3), 323-333. doi:10.1177/0959354313485504
Gowan, J. A., & McNichols, C. W. (1993). The effects of alternative forms of knowledge representation on decision-making consensus. International journal of man-machine studies, 38(3), 489-507. doi:10.1006/imms.1993.1023
Jorm, A. F. (2015). Using the Delphi expert consensus method in mental health research. Australian and New Zealand Journal of Psychiatry, 49(10), 887-897. doi:10.1177/0004867415600891
LeBel, E. P., & Peters, K. R. (2011). Fearing the future of empirical psychology: Bem's (2011) evidence of psi as a case study of deficiencies in modal research practice. Review of General Psychology, 15(4), 371. doi:10.1037/a0025172
McKenna, H. P. (1994). The Delphi technique: a worthwhile research approach for nursing? Journal of Advanced Nursing, 19(6), 1221-1225. doi:10.1111/j.1365-2648.1994.tb01207.x
Murphy, M., Black, N., Lamping, D., McKee, C., Sanderson, C., Askham, J., & Marteau, T. (1998). Consensus development methods, and their use in clinical guideline development. Health technology assessment, 2(3), i-iv, 1-88. doi:10.4135/9781848608344.n24
Rouder, J. N., & Morey, R. D. (2011). A Bayes factor meta-analysis of Bem’s ESP claim. Psychonomic Bulletin & Review, 18(4), 682-689. doi:10.3758/s13423-011-0088-7
Schlitz, M., Wiseman, R., Watt, C., & Radin, D. (2006). Of two minds: Sceptic‐proponent collaboration within parapsychology. British Journal of Psychology, 97(3), 313-322. doi:10.1348/000712605X80704
Schwarzkopf, D. S. (2014). We should have seen this coming. Frontiers in human neuroscience, 8(332). doi:10.3389/fnhum.2014.00332
Steneck, N. H. (2006). Fostering integrity in research: Definitions, current knowledge, and future directions. Science and engineering ethics, 12(1), 53-74.
Storm, L., Tressoldi, P. E., & Di Risio, L. (2010). Meta-analysis of free-response studies, 1992–2008: Assessing the noise reduction model in parapsychology. Psychological Bulletin, 136(4), 471-485. doi:http://dx.doi.org/10.1037/a0019457
Wagenmakers, E.-J., Wetzels, R., Borsboom, D., Kievit, R., & van der Maas, H. L. J. (2015). A skeptical eye on psi. In E. May & S. B. Marwaha (Eds.), Extrasensory Perception: Support, Skepticism, and Science (pp. 153-176). NB: ABC-CLIO.
Wagenmakers, E.-J., Wetzels, R., Borsboom, D., & van der Maas, H. L. (2011). Why psychologists must change the way they analyze their data: the case of psi: comment on Bem (2011). Journal of Personality and Social Psychology, 100(3), 426-432. doi:10.1037/a0022790
Woudenberg, F. (1991). An evaluation of Delphi. Technological forecasting and social change, 40(2), 131-150. doi:10.1016/0040-1625(91)90002-W
[1]: https://osf.io/he8sb/wiki/Eligibility%20criteria/
[2]: https://osf.io/he8sb/wiki/Hypotheses%20and%20conclusions/
[3]: https://osf.io/he8sb/wiki/Stopping%20rules/
[4]: https://static1.squarespace.com/static/533155cae4b08b6d16d34ec4/t/555c9b76e4b0e543f2e387b2/1432132471283/?format=300w