Results and Discussion
----------------------
**Detailed Sample Description (including details about changes to the original protocol; identical to published version)**
For RRR Study 1, participants were recruited from the introductory psychology participant pool at the University of Münster in Germany, and they participated as part of one option for course credit. For RRR Study 2, approx. 20% of the participants were recruited from the participant pool and the remaining 80% were recruited from the broader campus and were compensated €6 for participating. We recruited from our participant pool and community without specifying restrictions on race or age, so our total sample included an additional 55 participants—20 in Study 1 and 35 in Study 2—who did not meet inclusion criteria for this RRR. Data from those participants are included on our OSF page. Additionally, data on the OSF page includes participants who did not understand the nature of the event as well as a sample (*N* = 36) from our initial attempt to run Study 1 with the original English-language version of the video.
Given that our participants were not native English speakers, the second author translated all instructions to German, and a bilingual student assistant independently translated them back to English to verify the accuracy of the translation. Based on a small, informal pretest, we initially assumed that our participants would be able to understand the video with the original sound track and therefore did not dub it. However, we added an additional question at the end of the study to verify that participants understood that the video depicted a bank robbery. Because many participants did not understand the nature of the event depicted in the original video, we changed the protocol to replace the English audio track with a German translation. We informed the editors about this breach of protocol and excluded all participants who watched the original version of the video from the final sample. Based on our preregistered plan, we excluded any participants who did not understand the nature of the video. Because of the need for this change to Study 1, we were unable to reach the preregistered 50 participants per condition (final sample: *n* = 46 in the control and *n* = 41 in the experimental condition). Similarly, because of the need to mainly recruit participants outside the psychology department for Study 2, we did not reach our goal of 50 participants per condition after exclusion of problem cases (final sample: *n* = 46 in the control and *n* = 43 in the experimental condition). In all other respects, our procedure followed the standard protocol.
**Results**
For both studies, we report (1) Chi-Square tests comparing the percentage of correct identification in the experimental and control condition, (2) Chi-square tests comparing the ratio of the percentage selecting the wrong face (misidentification) to the percentage indicating “not present” across the experimental and control conditions, and (3) compute 2 (Condition: control vs. experimental) x 2 (line-up choice: correct vs. incorrect) ANOVAs on confidence ratings. In addition to reporting these analyses based on the final approved sample we will also report all analyses based on our complete sample. The latter set of analyses include participants who did not meet the inclusion criteria specified in the approved protocol (i.e., those who did not meet the age criteria or who did not understand the event depicted in the video).
STUDY 1
-------
**Final Sample**
*Correct identifications.* In the control condition 52.17 % of participants (vs. 36.59 % in the experimental condition) correctly identified the target person, Chi-square (1) = 2.13, *p* = .144.
*False rejections versus wrong identifications.* In addition to choosing one of the 8 persons from the line-up, participants could also choose the option "target not present", and thus, falsely reject the target person (instead of choosing the wrong target). When comparing these two kind of errors we found that 31.82 % (*n* = 7) of participants in the control condition indicated that the target was not present in the line-up. In contrast, more than twice as many participants (65.38 %; *n* = 17) in the experimental condition did not believe the target was present in the line-up, Chi-square (1) = 5.371, *p* = .020.
*Confidence ratings.* When submitting participants' confidence rating to a 2 (line-up choice: correct vs. incorrect) X 2 (condition: control vs. experimental) ANOVA there was only a significant main effect of choice, *F* (1,83) = 5.533, *p* = .021, partial-eta squared = .062, indicating that those participants who correctly identified the target (*M* = 4.79, *SD* = 1.24) actually were more confident in their judgment than participants who chose the wrong target (*M* = 4.13, *SD* = 1.96). There were no further effects, *F*s < 1.479.
**Full Sample**
When running the same analyses across all participants, irrespective of language version of the video, age, or their understanding of the event depicted in the video (*N* = 141) the above described pattern became stronger.
*Correct identifications.* When using our full sample, 61.67 % (*n* = 37) of participants in the control condition versus 38.03 % (*n* = 27) in the experimental condition correctly identified the target person compared to 33 incorrect identifications (vs. 44 in the experimental condition), Chi-square (1) = 3.127, *p* = .077.
*False rejections versus wrong identifications * Similarly, in the control condition 39.39 % (*n* = 13) of participants who were not able to correctly identify the target chose “target not present” whereas, in the experimental condition, 63.63 % (*n* = 28) of participants falsely rejected all targets, Chi-square (1) = 4.452, *p* = .035.
*Confidence ratings.* The pattern of results for the confidence ratings did not change much. Again, only the correct identification made a difference, with participants being more confident in their choice after selecting the right (*M* = 4.66, *SD* = 1.54) versus wrong (*M* = 4.12, *SD* = 1.31) target, *F* (1,137) = 4.886, *p* = .029, partial-eta squared = .034. There were no other significant effects, *F*s < 1.
STUDY 2
-------
**Final Sample**
*Correct identifications.* In the control condition 56.52 % (*n* = 26) of participants versus 34.88 % (*n* = 15) in the experimental condition correctly identified the target person, Chi-square (1) = 4.188, *p* = .041.
*False rejections versus wrong identifications * In the control condition 45 % of participants who were not able to correctly identify the target chose the option “target not present”, compared to 71.43 % who did so in the experimental condition, Chi-square (1) = 3.41, *p* = .065.
*Confidence ratings.* In the 2 (line-up choice: correct vs. incorrect) x 2 (condition: control vs. experimental) ANOVA there was only the significant main effect of choice, *F* (1, 85) = 7.462, *p* = .008, partial-eta squared = .081. Again, those participants who correctly identified the target (*M* = 4.80, *SD* = 1.616) were more confident in their judgment than participants who chose the wrong target (*M* = 3.88, *SD* = 1.645). There were no further effects, *F*s < 1.
**Full Sample**
When running the same analyses using our full sample (*N* = 124) the above described findings became stronger.
*Correct identifications.* In the control condition 54.55 % (*n* = 36) participants correctly identified the target person compared to 36.21 % (*n* = 21) in the experimental condition, Chi-square (1) = 4.180, *p* = .041.
*False rejections versus wrong identificantions.* Analogously, 43.33 % (*n* = 13) of participants who were not able to correctly identify the target in the control condition versus 67.57 % (*n* = 25) in the experimental condition falsely chose “target not present”, Chi-square (1) = 3.963, *p* = .046.
*Confidence ratings.* Finally, once more the 2 x 2 ANOVA produced only the main effect of line-up choice, *F* (1,120) = 8.189, *p* = .005, partial-eta squared = .064. Again, participants who correctly identified the target (*M* = 4.89, *SD* = 1.472) were more confident in their judgment than participants who chose the wrong target (*M* = 4.07, *SD* = 1.654). There were no further effects, *F*s < 1.