*1st 750 participants*
The final sample size after dropping participants for failing the seriousness check was 734. Consistent with our prediction, participants rated Team B as significantly less qualified if they were told ‘your assistant has rated Team B as significantly less qualified than Team A’ (M = 5.971, SD = 2.463, n = 377) than if they were told ‘your assistant has rated Team A as significantly more qualified than Team B’ (M = 6.524, SD = 2.031, n = 357; tseparate-variance (718.595) = 3.326, p < .001; d = -.244, 95% CI = -.39 to -.099).
*2nd 750 participants*
The 2nd 750 participants (764 to give us the full sample size of 1500) showed the same basic results as the 1st 750. Participants rated Team B as significantly less qualified if they were told ‘your assistant has rated Team B as significantly less qualified than Team A’ (M = 5.868, SD = 2.615, n = 394) than if they were told ‘your assistant has rated Team A as significantly more qualified than Team B’ (M = 6.6, SD = 2.234, n = 369; tseparate-variance (754.725) = 4.159, p < .001; d = -.3, 95% CI = -.443 to -.157).
*Full 1500*
The magnitude of the effects were not significantly different between the first and second rounds of data collection (p > .46). As seen in the two rounds of data collection, with the data aggregated across all participants, people rated Team B as significantly less qualified if they were told ‘your assistant has rated Team B as significantly less qualified than Team A’ (M = 5.918, SD = 2.541, n = 771) than if they were told ‘your assistant has rated Team A as significantly more qualified than Team B’ (M = 6.561, SD = 2.134, n = 727; tseparate-variance (1476.64) = 5.314, p < .001; d = -.273, 95% CI = -.375 to -.172).