Main content

Home

Menu

Loading wiki pages...

View
Wiki Version:
# The wisdom of crowds in fingerprint examination *The contributors to this project are listed alphabetically until the resulting manuscript has been submitted for publication.* ## Update In March 2018, we modified our preregistered project here: https://osf.io/2b8fp because we realised that we needed data from 36 experts and 36 novices in order to have a complete set for analysis given the staggered design we employed. Unfortunately, we didn't begin another round of testing fingerprint experts until February 2019 at which time we completed our expert data collection. By March 2018, we had also collected data from 32 novice participants and intended to run another 4 for a complete design. Since March 2018, we have started yoking our novice participants to our experts in each of the other experiments we're conducting in our group, so we have decided to run a new group of 36 novice participants where each novice will see exactly the same pairs of fingerprints as their expert counterpart in exactly the same order. We plan to run these 36 novice participants May-July 2019, where half the participants will be run at The University of Queensland and half will be run at The University of Adelaide. Also, since our original preregistration, we have been reading a variety of papers on “collective intelligence” in medical decision making, for example: * Wolf, M., Krause, J., Carney, P. A., Bogart, A., & Kurvers, R. H. (2015). Collective intelligence meets medical decision-making: the collective outperforms the best radiologist. *PloS one*, *10*(8), e0134269. * Kurvers, R. H., Krause, J., Argenziano, G., Zalaudek, I., & Wolf, M. (2015). Detection accuracy of collective intelligence assessments for skin cancer diagnosis. *JAMA dermatology*, *151*(12), 1346-1353. * Kurvers, R. H., Herzog, S. M., Hertwig, R., Krause, J., Carney, P. A., Bogart, A., … & Wolf, M. (2016). Boosting medical diagnostics by pooling independent judgments. *Proceedings of the National Academy of Sciences*, *113*(31), 8777-8782. * Kämmer, J. E., Hautz, W. E., Herzog, S. M., Kunina-Habenicht, O., & Kurvers, R. H. (2017). The potential of collective intelligence in emergency medicine: pooling medical students’ independent decisions improves diagnostic performance. *Medical decision making*, *37*(6), 715-724. It’s clear from these findings that there’s great potential for collective intelligence in forensic decision making given its success in medical decision making, since the two domains are quite similar in their approach. Secondly, it’s evident that there are several different aggregation methods (e.g., majority, quorum, weighted quorum, combining confidence ratings or yes/no judgements, etc.), and a wide variety of methods for reporting the results (e.g., in terms of the true and false positive rates, sensitivity, specificity, compared to the mean or best individual performance, based on every group size or just odd numbers, etc.), so this area of collective intelligence seems to be particularly well suited for data dredging and outcome switching. Before we start analysing our results and collecting the data from our novice participants, we wanted to pause and update our preregistration to indicate exactly what analyses we’ve done to date, what analyses we plan to conduct, update our predictions, and specify how we intend to report our results. ## Data collection done date We have finished collecting data from 36 expert fingerprint examiners. Since we’re planning to present the expert results at the Society for Applied Research in Memory and Cognition 2019 conference in June, we had a look at the modal performance of experts when they were presented with 44 pairs of prints for 20 seconds (91.67%) and when they were presented with 4 pairs of prints for an untimed duration (97.92%). We have not looked at the results from the unyoked novices we collected, and as indicated above, we have decided to run a new group of 36 novice participants where each novice will see exactly the same pairs of fingerprints as their expert counterpart in exactly the same order between May–July 2019, where half the participants will be run at The University of Queensland and half will be run at The University of Adelaide. To date, we have collected data from 6 of these novice participants at The University of Adelaide, but we have not looked at the data. ## Rationale and design, Participants, Materials, and Procedure All of these details are available in our preregistered project here: https://osf.io/2b8fp. ### Collective Intelligence Rules Several of the above papers in medical decision making have demonstrated a significant boost in collective performance by using a variety of methods for aggregating the independent judgments of examiners. We have decided to examine the particular aggregation methods that are the most appropriate for the context of collective decision making in fingerprint identification: 1. **Follow-the-Majority Rule**: adopt the judgment with the most support in the group (applied from a group size of 3 upward as in Kämmer et al, 2017). For example, a random group of five examiners might provide ratings of “Same,” “Same,” “Same,” “Different,” “Different.” Since the majority (three) of the five examiners provided a "Same" judgement, this “Same” decision would be adopted. 2. **Follow-the-Most-Confident Rule**: adopt the judgment with the highest confidence rating. For example, a random group of five examiners might provide ratings of 7, 10, 9, 8, and 1. Even though four of the five examiners each provided a "Same" judgement (>= 7), the extreme “Different” rating of 1 (<=6) was the most extreme of the five, so this highly confident examiner's decision would be adopted. If people are are equally confident about the two options, then one will be selected at random. (Note: given our 12-point scale, a confidence rating of, say, 1 “Sure Different” and 12 “Sure Same” have the same level of confidence, so only one of these ratings would be selected at random). 3. **Follow-the-Most-Senior Rule**: adopt the judgment of the most senior examiner (i.e., the person with the greater number of years examining prints). For example, a random group of five examiners might have 7 years, 10 years, 9 years, 8 years, and 25 years of experience. Even though the four less experienced examiners each provided a "Same" judgement, since the most experienced examiner with 25 years of experience provided a “Different” rating, the decision of the most experienced examiner would be adopted. If examiners have the same level of experience, the response of one examiner will be selected at random. Since none of the novice control participants have any experience with fingerprints, this rule will only apply to professional examiners. ## Planned Analyses As indicated in our preregistration, we will be measuring accuracy, response time, confidence and response bias, but we will not be examining the effects of trial by trial response time here. In our previous preregistration, we indicated that *“We will be calculating hit and false alarm rates and discriminability (e.g. *A* prime) to examine our hypotheses. We will conduct a 2 Task type (crowd, single) x 2 Expertise (novice, expert) mixed ANOVA for discriminability, collapsing across matches and non-matches,”* but instead of computing *A* prime, we'll use a corrected formula that was introduced by Zhang and Mueller (2005) and is denoted as *A* (see also Searston, Thompson, Vokey, French, & Tangen, 2019). The new formula can be correctly interpreted as the average of the maximum area and minimum area under the proper receiver operating characteristic curve constrained by the hits and false alarms. We’ll start by computing the hit and false alarm rates for each of the 36 individual experts and novices based on the 22 targets and 22 distractors they examined, as well as their corresponding discriminability (*A*) and bias (*B*) scores (Note: 36 novices and experts provided 20 second timed ratings of the 44 prints and untimed ratings of 4 prints = 36 novices and experts who rated 48 prints). Next, we’ll take a random sample of 3 experts and apply the Follow-the-Majority Rule to arrive at a binary “Same”/“Different” aggregated decision (e.g., if 2 out of 3 people say "Same" on a single trial that rating represents the crowd's rating on that trial) for these three people for each of the timed 22 targets and 22 timed distractors they examined. We'll then compute the crowd's discriminability (*A*) and bias (*B*) scores based on their 44 aggregated ratings. We’ll repeat this process for 36 aggregated random novices with a group size of 3. Then we’ll increase the crowd size to 5, and repeat the process so we have another set of 44 aggregated scores for each of the 36 aggregated random experts and 36 aggregated random novices, and so on for each odd number up to 35. ### Discriminability We’ll then conduct a 2 (Expertise: novice, expert) x 18 (Crowd Size: 1, 3, 5, 7, … 35) ANOVA on the crowds' discriminability (*A*) scores to measure the effect of expertise and crowd size on discriminability. Based on our previous work, we expect that individual experts will perform much better than individual novices (at a crowd size of 1). We also expect a classic Wisdom of the Crowd effect where discriminability will increase sharply as a function of Crowd Size and likely reach asymptote at roughly 5 people. As we predicted earlier, we expect the optimal crowd of novices (at peak discrimination) will be more accurate than individual novices, but still less accurate than individual experts. We expect that an optimal crowd of experts (at peak discrimination) will perform nearly perfectly (*A* = 1.0). As described above, each expert and novice was presented with 44 pairs of prints (22 targets and 22 distractors), which they examined for 20 seconds. Each person also examined another 4 pairs of prints (2 targets and 2 distractors) for as long as they wished. The experiment was designed so that 3 novices and experts rated 4 new pairs of prints untimed, equating to 44 untimed ratings across the 36 novices and experts. We will conduct a 2 (Expertise: novice, expert) x 18 (Crowd Size: 1, 3, 5, 7, … 35) ANOVA on the crowds' discriminability (*A*) scores to measure the effect of expertise and crowd size on discriminability when participants are provided with an unlimited time to make a decision. ### Bias We’ll conduct a separate 2 (Expertise: novice, expert) x 18 (Crowd Size: 1, 3, 5, 7, … 35) ANOVA on the bias (*B*) scores to measure the effect of expertise and crowd size on bias. Based on our previous work, we expect that individual experts will be more conservative than individual novices by saying “Different” much more often regardless of whether the prints are the same or different (at a crowd size of 1). We also expect these bias scores to decrease as a function of Crowd Size and likely reach asymptote at a neutral level at roughly 5 people (mirroring the sharp increase in discriminability scores). As above, we will conduct a 2 (Expertise: novice, expert) x 18 (Crowd Size: 1, 3, 5, 7, … 35) ANOVA on the crowds' bias (*B*) scores to measure the effect of expertise and crowd size on bias when participants are provided with an unlimited time to make a decision. ### Aggregation Method Once we’ve established the optimal crowd size for experts using the Follow-the-Majority Rule, we’ll compare this aggregation rule to the Follow-the-Most-Confident, and the Follow-the-Most-Senior Rules with a 2 (Expertise: novice, expert) x 3 (Aggregation Rule: majority, confidence, seniority) ANOVA using this optimal crowd size to figure out which rule is best in terms of discriminability. Given the results of Kurvers et al (2016) and Kämmer et al (2017), we expect the Follow-the-Majority Rule to produce the best performance compared to both the Follow-the-Most-Confident Rule and the Follow-the-Most-Senior Rule. We’ll repeat this analysis for the bias scores to see how the three methods compare at this optimal crowd size. As above, we will conduct a 2 (Expertise: novice, expert) x 3 (Aggregation Rule: majority, confidence, seniority) ANOVA on the discriminability and bias scores to see whether this aggregation rule depends on whether participants are permitted to examine these pairs of prints for a limited amount of time or not. ## Generalisation We have recently conducted a protocol analysis on 44 qualified experts (https://osf.io/zaekm). Unlike the above experiment where everyone examined the same target and distractor pairs, in this new experiment, each expert was presented with 2 easy, 2 moderate, and 2 difficult pairs of prints (3 targets and 3 distractors), which were randomly sampled from a larger set of 20 easy (10 targets, 10 distractors), 20 moderate (10 targets, 10 distractors), and 20 difficult (10 targets, 10 distractors) pairs, and they were given as much time they needed to make a decision while thinking aloud. We are roughly halfway through testing a group of 44 age and gender matched controls on exactly the same items, which were presented in exactly the same order. Once we have analysed the results from the above experiment, we will conduct the same collective intelligence analyses using the results from this new experiment comparing the yoked novices to experts to see whether this wisdom of the crowd effect will generalise to actual crime scene photographs that have been handpicked by our police collaborators, where the “difficult” items were selected to challenge even the most experienced professional examiners. Given the 40 moderate and difficult fingerprint pairs that were presented, each pair was examined by a range of people (between 2–11 participants). In order to conduct the wisdom of the crowd analysis on these data, we’ll eliminate the one moderate target trial with only 2 ratings, and compute the discriminability scores on the remaining 19 targets and 20 distractors, which have at least 3 ratings each, and we’ll randomly sample 3 ratings. We’ll repeat the above analyses on the discrimination scores, but we’ll only compare the each individual’s performance to the crowd size of 3 and compare the three aggregation methods listed above.
OSF does not support the use of Internet Explorer. For optimal performance, please switch to another browser.
Accept
This website relies on cookies to help provide a better user experience. By clicking Accept or continuing to use the site, you agree. For more information, see our Privacy Policy and information on cookie use.
Accept
×

Start managing your projects on the OSF today.

Free and easy to use, the Open Science Framework supports the entire research lifecycle: planning, execution, reporting, archiving, and discovery.