Frequently Asked Questions -------------------------- Daniel J. Simons and Walter Boot *This document provides some definitions and then answers questions about the claims in our paper. If you have questions not addressed in this FAQ or in the paper, please email us.* ---------- Some Definitions ---------------- *Randomized Controlled Trial (RCT)*: A randomized controlled trial is an intervention study that includes at least one treatment condition and at least one control condition, with participants randomly assigned to the conditions. *Double Blind Design*: In a double blind design, both the participants and the experimenters are kept in the dark about the condition assignment. That is, participants do not know whether they are in a treatment condition or a control condition, and experimenters do not know which participants are in each condition. In a single-blind design, participants do not know whether they are in the treatment or control condition, but experimenters are aware of the condition assignments. *Placebo-controlled design*: A placebo control condition is one that appears in all respects to be identical to the treatment condition but that lacks the critical ingredient of the treatment. *Active Control*: An active control group is one in which participants engage in some task during the intervention period. Active control groups are not necessarily matched to the treatment group in any way, and the tasks performed by an active control group might differ in many ways from those performed by the experimental group. *No-Contact control*: A no-contact control group takes the same pre-test and post-test as the treatment group, but does not complete any task during the intervention period (they have no contact with the experimenters). *Waitlist control*: A waitlist control group believes that they will receive the treatment at some later point. Typically, such participants do not complete the pre-test with the treatment group. They typically do complete the post test. *The gold standard*: The gold standard for an intervention is a double-blind randomized controlled trial with a placebo control group that is matched to the treatment group in all respects except for the treatment. ---------- Frequent Questions and Our Answers ---------------------------------- <br><br> **Why is a double-blind randomized controlled trial with a placebo condition the gold standard design for an intervention study?** The double-blind randomized controlled design minimizes the chance that differences between the intervention group and control group are due to factors other than the treatment each group is receiving. Let’s start with randomization: When people are randomly assigned to an intervention or control group, pre-existing differences between people will be equally distributed across the conditions (with a large enough sample size). Randomization helps to eliminate systematic differences in the people who receive the treatment and who are in the control group. Next consider blinding: With a double-blind design, neither the experimenters assessing the participants nor the participants themselves know who is in the control group and who is receiving the treatment. If experimenters do not know who is receiving the treatment, their own biases and expectations cannot contribute to any difference between the groups. If participants know that they are receiving the real treatment, they may expect more improvement, and those expectations can produce improvements (the placebo effect). If participants in the treatment condition expect more improvement than do those in the control condition, any difference between the groups might be due to expectations rather than the treatment itself. By making participants blind to whether they are in the treatment condition or the control condition, their expectations can’t affect differences between the conditions. The final component of this gold-standard design is a control group that, from the participant’s perspective, looks identical to the treatment group, but contains no active ingredient (i.e., the control group should not lead to improvements because it does not include the critical ingredient). If the two conditions are indistinguishable to participants, then it is unlikely that one group will expect to improve more than the other. If the intervention group subsequently improves more than the control group, the source of this improvement cannot be attributed to expectations and must be due to the active ingredient of the intervention. <br><br><br> **Why don’t psychology interventions use the gold standard approach** <br><br> They can’t. In a medical trial, the control group might be given a pill that looks exactly like the treatment, but that contains no active ingredient. Because the experience of the control group and intervention group is identical (except for the unseen contents of the pill ), expectations should be identical as well. Psychological interventions are more complicated than simply taking a pill, and participants in psychological interventions see the contents of their intervention. For example, a participant receiving an experimental form of therapy to treat their anxiety knows that they are receiving therapy and the type of therapy they are receiving. In many cases it is impossible for participants to be blind to the nature of their treatment, leaving open the possibility that any improvements observed are due to the expectation that improvements should occur (i.e., placebo effects). We illustrate one example in our paper: In many interventions that test whether action video games improve cognition, the control group plays many hours of Tetris. We found that participants don’t expect Tetris to result in much improvement, but participants in the action game conditions do expect improvements. Because expectations differ, they might contribute to differences between the experimental and control groups, meaning that it is inappropriate to conclude that the intervention improved performance. <br><br><br> **Why does the article single out videogame interventions? Do they use particularly bad designs?** <br><br> No, in fact these intervention studies are among the best controlled interventions in the published literature. Many intervention studies still compare the effectiveness of a treatment to the performance of a control group that simply does nothing. This does little or nothing to control for the potential effect of differential expectations (it is unlikely that participants would believe doing nothing would result in improvement). We focus on video game interventions to show that even studies that control for many other important variables still do not account for differential expectations. <br><br><br> **What are some of the challenges that researchers face that make it difficult to test an intervention against the strongest possible control group?** <br><br> We are sympathetic with the challenge of identifying an ideal control condition. We have struggled with this challenge in our own research. Inadequate funding unfortunately can contribute to the use of inadequate control groups. Including an active control condition effectively doubles the cost of an intervention compared to just testing the experimental group. By cutting budgets for intervention studies, investigators are forced to reduce the quality of their controls. (We would like funding agencies to reconsider the implications of such cuts -- if reduced funding makes a good control group impossible, it’s not clear that funding a study with a weak control is a good use of funds). Another factor that contributes to the use of weak control groups is the acceptance, publication, and publicity accrued by studies making bold causal claims without appropriate control groups. The ability to publish a study that lacks solid controls in a top journal disincentivizes researchers from undertaking the more challenging task of developing an ideal control group. By bringing these concerns to light, we hope to change these incentives—reviewers and editors have the power to insist that interventions use active control groups that equate address differential expectations. Despite the challenges faced by psychology interventions, the methods we outline in our paper can increase the validity of the causal conclusions drawn from such studies. <br><br><br> **If psychology interventions typically cannot use a double-blind design, should we ignore or dismiss all evidence from such interventions?** <br><br> No. There are many ways to account for expectation effects even if the gold standard design is unattainable for a particular type of intervention. You should, however, be highly skeptical of any claim that an intervention was effective if the study did not take steps to account for differential expectations between the experimental and control groups. <br><br><br> **Without a double blind design, can a psychology study ever provide strong evidence for the causal efficacy of a treatment?** <br><br> Yes. In our paper we outline a number of methods to assess expectations, choose control conditions that equate expectations, and choose outcome measures that are insensitive to expectations. If participants in the control and intervention groups expect the same degree of improvement on an outcome measure, then placebo effects are less likely. Even when the gold standard design is unattainable, researchers can use such “silver” standard designs and then check for the problems that the gold standard design eliminates. <br><br><br> **Is there an ideal design to use if a double blind design is not possible?** <br><br> No. However, in our paper we discuss a number of methods to measure, equate, and manipulate expectations so that differential improvement by the intervention group is unlikely to be the result of differential expectations. None of these methods provide the strength of evidence that an actual placebo-controlled randomized double-blind design can provide, but they can approximate it. All published studies that cannot use a placebo-controlled randomized double-blind design should note the limitations of the design and the inferences it permits. <br><br><br> **Should an intervention that lacks controls for differential placebo effects be publishable?** <br><br> As long as a study frankly discusses its limitations and acknowledges that strong causal conclusions are not merited, then yes. However, in the future, the standards for publication of interventions need to change. Intervention studies are expensive, and it is a waste of resources to use inadequate control conditions. The standards for interventions should change, and studies that control for and measure expectations are the ones that should appear in top-tier journals. Studies that do not account for differential expectations and make unqualified causal claims anyway should not be published. Studies that do not acknowledge the limitations imposed by a lack of control for differential expectations should not be published. Studies that use a wait-list control group as the only comparison should not be published -- that type of control does nothing to account for even the minimum possible alternative explanations for improvements. Studies that use a no-contact control should acknowledge that they only controlled for test-retest effects and nothing else. They should be published only if claims of intervention benefits are treated as entirely speculative. We hope that journals (and reviewers) will adopt higher standards, publishing studies that use active control groups and check for differential placebo effects, requiring explicit discussion of limitations when they do not, and ensuring that articles do not make causal claims without including adequate control conditions. <br><br><br> **Does the requirement to test for expectations set the bar for methodological rigor too high?** <br><br> We believe it is fair to demand adequate testing and control for placebo effects in all psychology interventions. Psychologists teach their students the importance of eliminating expectation effects and demand characteristics from experimental designs. It is a core principle of good design. Why should we accept less for published experimental work? <br><br><br> **Should the first, exploratory studies of a new intervention be held to the same standards as interventions in more established fields?** <br><br> Yes. Any intervention, even one addressing a new experimental question, should include adequate tests for expectation effects. A study lacking appropriate controls risks wasting effort, money, and time as researchers pursue false leads. Moreover, the methods of an initial, flawed study can become entrenched as standard practice, leading to their perpetuation; new studies justify their lack of control by citing previous studies that did the same. At a minimum, in the absence of appropriate controls for expectations, articles should fully acknowledge the limits of the research, noting that any conclusion about the causal efficacy of the intervention is premature. We prefer using appropriate methods even in the first study. (edited quote from p. 452 of the article). <br><br><br> **Does converging evidence overcome the lack of controls for expectations in any individual intervention study?** <br><br> Not necessarily. Converging evidence means little if individual studies do not eliminate confounds. Converging evidence bolsters causal claims only to the extent that the methods of the individual studies inspire confidence in the efficacy of the treatment. (edited quote from p. 452 of the article) <br><br><br> **If a study is described as a randomized controlled trial, does that mean that improvements in the treatment condition are due to the intervention?** <br><br> No. The use of a randomized controlled design means that participants were randomly assigned to the experimental and control groups. With a large enough sample size, that means that differences between the groups are less likely to be due to differences in who participated in each condition. Random assignment does nothing to eliminate differential expectation effects between the treatment and control conditions. Note that any study lacking randomized assignment to conditions provides extremely weak evidence that differences between conditions resulted from the treatment. Any effects could just be due to who was in each group rather than what they did as part of that group. <br><br><br> **If a study claims to have a placebo control condition, does that mean it addressed the concerns raised in the paper.** <br><br> Unfortunately, probably not. Many articles mistakenly describe any control condition as a placebo condition even if it does nothing to control for placebo effects. Some papers have described a no-contact control group as a placebo control! As noted in the paper, even active control conditions do not automatically account for differential placebo effects. Only control conditions that are equated to the treatment condition in the expectations for improvement on the critical outcome measures can be considered a placebo control. Unless reporting and reviewing standards improve, readers will need to evaluate whether a condition labeled as a placebo condition actually controlled for placebo effects. <br><br><br> **If the intervention group showed bigger improvements on some outcome measures but not on others, did the study control for placebo effects?** <br><br> No. Participants might well have different expectations for different outcome measures. For example, participants playing Tetris would have more reason to expect improvements in ability to rotate shapes in their mind’s eye than in their ability to remember stories. A given treatment will generate expectations for improvement on some outcome measures and not others. A pattern of differential improvement is just as likely to be explained by differential expectations as by a true treatment effect. <br><br><br> **If I measure expectations using MTurk like you did in the paper, and the expectations map onto the pattern of improvements shown in the intervention study, does that mean expectations explain the effect?** <br><br> No. It is possible that the differential expectations were not strong enough to produce the improvements or that they lacked causal potency as well. However, when the pattern of expectations maps onto the pattern of intervention results, that means the study design did not eliminate differential placebo effects. It remains possible that any differences between the treatment and control groups were due to differential expectations or to the different treatments. The result is inconclusive and should be treated as such. Strong causal claims are unmerited. <br><br><br> **If a psychology intervention lacks adequate controls for differential expectations or placebo effects, does that mean the intervention had no effect?** <br><br> No. The intervention might well be effective. Our point is that the study design does not permit any confidence in conclusions about the effectiveness of the intervention. The fact that the treatment group outperformed the control group could be due to the intervention, but it also could be due to differential expectations. Any claims about the effectiveness of the treatment should be treated as speculation that requires confirmation with a better design, one that accounts for expectation effects and any other differences between the experimental and control groups. <br><br><br> **Are you arguing that expectations can induce large changes in actual abilities?** <br><br> Perhaps they can, but that’s not our claim. Any improvement from pre-test to post-test could result from a change in the underlying abilities or it could result from a change in performance due to other factors. If you measure the same person repeatedly with the same task, you'll get variability in the estimate of the underlying ability even with no intervention and no change in underlying abilities. You may perform better on one occasion than another because you were more engaged, better rested, found the task more interesting, had incentives to try really hard, etc. People tend to perform better with a moderate level of arousal than with high or low levels of arousal, and expectations might affect arousal levels. In the case of interventions, we have two measurements of the same underlying trait, with a higher measurement after the intervention than before the intervention. That difference might just reflect underlying measurement variability, with expectations or arousal or interest (etc.) pushing the measure toward the top of its variability range in the post-test. If people care more, expect to improve, or try harder after training, you could get better performance even if there is no change at all in the underlying ability. They aren't willing themselves to improve their actual abilities. Rather, they’re willing themselves to perform better on the post-test than on the pre-test because they believe the intervention should lead to better performance. <br><br><br> **What if expectations are necessary for a treatment to work? Wouldn’t controlling for them eliminate the treatment effect?** <br><br>No. We are not suggesting that expectations for improvement must be eliminated entirely. Rather, we are arguing for the need to equate such expectations across conditions. Expectations can still affect the treatment condition in a double-blind, placebo-controlled design. And, it is possible that some treatments will only have an effect when they interact with expectations. But, the key to that design is that the expectations are equated across the treatment and control conditions. If the treatment group outperforms the control group, and expectations are equated, then something about the treatment must have contributed to the improvement. The improvement could have resulted from the critical ingredients of the treatment alone or from some interaction between the treatment and expectations. It would be possible to isolate the treatment effect by eliminating expectations, but that is not essential in order to claim that the treatment had an effect. In a typical psychology intervention, expectations are not equated between the treatment and control condition. If the treatment group improves more than the control group, we have no conclusive evidence that the ingredients of the treatment mattered. The improvement could have resulted from the treatment ingredients alone, from expectations alone, or from an interaction between the two. The results of any intervention that does not equate expectations across the treatment and control condition cannot provide conclusive evidence that the treatment was necessary for the improvement. It could be due to the difference in expectations alone. That is why double blind designs are ideal, and it is why psychology interventions must take steps to address the shortcomings that result from the impossibility of using a double blind design. It is possible to control for expectation differences without eliminating expectations altogether. <br><br><br> **Do placebo effects have a role to play in treatment?** <br><br> Expectations can have powerful effects, and they can be beneficial. Much of the research on placebo effects has focused on the role of expectations in reducing pain. From this research we know that individuals who receive what they believe to be a powerful analgesic (but is instead a placebo) request less morphine later. At some level, if a psychological treatment works because a patient expects it to work, it still helps the patient. Moreover, some medical and psychological interventions might work, or work better, in part due to expectation effects. Still, knowing why an intervention works can lead to improved understanding and possibly to more effective interventions. If a treatment has no effect beyond a placebo effect, consumers and researchers need to know that. Such pseudotherapies have risks and costs. Schools might change their curricula based on a successful intervention, therapists might change their approach to treatment if they believe an intervention works, companies market their products based on interventions that appear successful, and consumers adopt self-help strategies based on published interventions. If those interventions effects are driven by a placebo effect rather than an actual change to the underlying abilities or competencies, then such policy changes might fall flat. Within science, knowing whether improvements resulted from a fundamental change in competence or from a placebo effect is crucial. Intervention studies are expensive and time consuming, and many are funded by federal agencies. If an intervention fails to distinguish between placebo-based effects and intervention-based effects, it risks distorting the scientific understanding of the mechanisms for improvement and wastes research funds and time. <br><br><br>**Do expectations related to a treatment always result in a *positive* change on the study’s outcome measure?** <br><br> Not necessarily. And, the possibility that one condition might lead to positive expectations and the other might induce negative expectations makes it especially important to equate expectations across groups. If the control task is obviously unrelated to the outcome measures or if it is implausible that it would improve performance, then participants may actually perform worse (or show no improvement from retesting) on the outcome measure due to their negative expectations. The effect of negative expectations is similar to what some have referred to as the “nocebo” effect. Given that the critical measure is the difference in improvement between the treatment group and the control group, then an apparent “improvement” by the treatment group might be an illusion driven by the lack of improvement by the control group. This issue is more pernicious than confusing the mechanisms of an improvement in the treatment group because it creates the illusion of improvement even if there was not any real improvement in the treatment group at all. We (and others) have noted an unusual pattern in many studies finding a positive benefit of action video training. People usually perform better the second time they take a test because they have had practice with that measure (a retest effect). Yet, in many video game training studies, the control groups show no improvement at all on the outcome measure: they show no retest effect. The lack of this expected retest improvement could reflect a negative expectation for their training task (“Tetris is such a simple game that I must be in the control group”). If so, the relative improvement in the treatment group might just reflect a retest effect; they could show a relative improvement even if the training had no effect at all *and* they had no expectation that their training would improve performance. In other words, the "benefit" of the intervention could just reflect the elimination of usual retest effects in the control group. Without an adequate control for expectations, there is no way to determine whether the relative improvement by the treatment group reflected any influence of the treatment on performance. It could just result from positive expectations by the treatment group, negative expectations by the control group, or both. <br><br><br> **Brain fitness software companies claim that their training software will improve cognition. Does the evidence support that claim?** <br><br> None of the studies used to back claims of the effectiveness of these products adequately eliminates differential placebo effects. None control for differential expectations, and none even check for such expectation differences. Most of the evidence for these products is far weaker than the evidence for video gaming interventions that we discuss in our paper. The video game training studies use active control conditions that are at least somewhat matched to the training intervention. In contrast, some of the brain training companies base their claims on correlational evidence (better performance on Task A is associated with better cognitive abilities) which provides no evidence that training with Task A will improve those cognitive abilities. Others rely on evidence from studies that used “no contact” control conditions. For the small subset of claims backed by interventions with active control conditions, not one of the studies these companies cite as support for their products adequately controls for differential expectation effects. The popularity of these products and the revenues (and consumer costs) involved highlight the need to tighten the standards of evidence used to make causal claims about the effectiveness of an intervention.