UCSB Replication of UVA wave 3 ‘Predicting: Exploring vs. Confirming Data
John Protzko, Jonathan W. Schooler
Participants (N = 1500) were randomly assigned to take the study as part of the 1st 750 or 2nd 750. Within each 750, participants read about past election results of the U.S. House of Representatives in terms of Republicans and Democrats winning or losing seats overall. For each election, there was information provided about: the number of Republicans and Democrats that were in the House of Representatives at the time of the election; whether a Republican won the Presidency during that election or was already in office at the time of the election (otherwise known as a Midterm Election); the average rate of unemployment during that election year; the average rate of economic growth during that election year; and the average rate of currency inflation during that election year. Participants are supposed to take this information into account to draw their conclusions about which factors predict which election outcomes.
In both conditions, participants predicted the results of a future election in the U.S. House of Representatives while varying the following factors (all presented simultaneously): More Democrats in the House of Representatives; More Republicans in the House of Representatives; Republican won the Presidency; Republican in office (Midterm Election); Unemployment is High; Unemployment is Low; GDP Growth is High; GDP Growth is Low; Inflation is High; Inflation is Low; The country is at War; The country is NOT at War.
In the confirmatory condition, participants first made predictions, then were given the opportunity revise their predictions, after seeing the final results, to form their final impressions of which factors matter. In the exploratory condition, participants get to see the final results of the election and ‘predict’ (after seeing the results) which factors influence the outcome. The hypothesis was that initial predictions would be higher than final predictions, and (critically) being reminded of initial predictions will cause people to stick to their initial predictions more, causing an increase in the number of factors participants believed would influence the election (either positively or negatively). Consistent with our pre-registration plan, we analyzed those who were part of the 2nd 750 first, followed by those who were part of the 1st 750, then combined the data for the full 1500. The strength of the effect was not statistically different between the two 750s (p > .39).
2nd 750
The replication model is a pooled-variance, independent samples t-test. Inconsistent with the prediction, being reminded of their initial predictions (M = 8.107, SD = 4.285, n = 373) did not alter how many factors participants believed would influence an election (M = 7.751, SD = 3.763, n = 377; t (748) = 1.211, p > .22; d = .086, 95%CI = .232 to -.055; see Figure 1).
Figure 1: Number of Factors participants in the 2nd 750 group believed would influence a House of Representatives election based on whether they had to confirm their predictions first or they were allowed to explore the final data and make their prediction. Results were the same in both groups (BF = 4.709 in favor of the null).
We had pre-registered to also test the results using a Poisson regression to accommodate the count-nature of the data. As can be seen in Figure 1, however, the data do not conform to a Poisson distribution (χ²Pearson goodness-of-fit (749) = 1341.154, p < .001). Instead, the data, deeply negatively skewed, are clustered at the maximum number of factors, suggesting participants saw all 12 factors as significant influences in the U.S. House of Representatives election. The data is censored on the maximum number of factors. To properly test such a non-normal yet also non-Poisson distribution with such high negative skew, we first transform the data to make the data positively skewed (subtract the number of factors from 12, so the excess of 12s now correspond to an excess of zeroes). Then we test whether there is an inflation of zeroes above what would be expected in a negative-binomial distribution. As there is evidence of an excessive amount of zeroes (really, 12s just recoded; ZVuong = 6.53, p < .001), we analyze this transformed data using a zero-inflated negative binomial model (Long et al., 2006).
In the 2nd 750 participants, after accounting for the right-censoring and negative skew of the data, the results are as follows: First, for responses less than the absolute maximum (12), there was not a reliable difference between having to pre-register/being reminded of that prediction and being allowed to explore the data (bnegative-binomial (520) = .111, p > .05). For values at the maximum, however, the results were different. Being reminded of your initial prediction caused participants to admit to believing all 12 predictors were influential more than if they were allowed to explore the final results (blogit (226) = .73, p < .001; see Figure 2).
Figure 2: Cumulative density functions of the number of factors participants in the 2nd 750 group believed would influence a U.S. House of Representatives election based on whether they had to confirm their predictions (cobalt distribution) first or they were allowed to explore (ebony distribution) the final data and make their prediction. Zero-inflated negative binomial model accounting for censoring and skew shows no difference in the region below the maximum (BF = 1.399 in favor of the null) but that having to confirm one’s prediction makes people more likely to admit they thought all 12 factors to influence the election (blogit (226) = .73, p < .001). Black dashed reference line differentiates those at the extreme from the rest of the distribution.
1st 750
In the 1st 750 participants, however, we were able to confirm the effect under question using the replication model of an independent-samples, pooled-variance t-test. Being reminded of their initial predictions caused participants to believe more factors mattered to the outcome of a U.S. House of Representatives election (M = 8.734, SD = 3.89, n = 365) than if they were allowed to explore the results (M = 8.034, SD = 3.829, n = 385; t (748) = 2.485, p > .014; d = .182, 95%CI = .325 to .038).
The results from the zero-inflated negative binomial model, accounting for the skew and censoring of the distribution, was the same as seen in the 2nd 750. Among the distribution not at the maximum, there was no effect of having to confirm or being allowed to explore the data (bnegative-binomial (499) = -.058, p > .38), For the maximum value choices, however, being reminded of your initial prediction caused participants admit to believing all 12 predictors were influential more than if they were allowed to explore the final results (blogit (247) = .477, p = .01; see Figure 3).
Figure 3: Cumulative density functions of the number of factors participants in the 1st 750 group believed would influence a U.S. House of Representatives election based on whether they had to confirm their predictions (cobalt distribution) first or they were allowed to explore (ebony distribution) the final data and make their prediction. Zero-inflated negative binomial model accounting for censoring and skew shows no difference in the region below the maximum (BF = 5.514 in favor of the null) but that having to confirm one’s prediction makes people more likely to admit they thought all 12 factors would influence the election (blogit (247) = .477, p = .01). Black dashed reference line differentiates those at the extreme from the rest of the distribution.
Full 1500
In the complete sample, using the replication model of a t-test, we were able to replicate the effect using the replication model. Being reminded of one’s initial predictions, having to ‘pre-register’ them, causes people to believe more factors influence the U.S. House of Representatives election (M = 8.417, SD = 4.104, n = 738) than if they had not been reminded and instead been allowed to explore the results (M = 7.894, SD = 3.797, n = 762; t (1498) = 2.566, p < .011, d = .132; 95%CI = .234 to .031). Accounting for the censoring and skewed distribution using a zero-inflated negative binomial model returned the same result as in the individual 750s. Being reminded or one’s initial predictions before seeing the data had no effect on the number of factors believed to affect elections not at the extreme of all 12 factors (bnegative-binomial (1021) = .033, p > .45), When predicting all 12 choices, being reminded of one’s initial prediction caused participants to admit to believing all 12 predictors were influential more than if they were allowed to explore the final results (blogit (475) = .605, p < .001; see Figure 4).
Figure 4: Cumulative density functions of the number of factors participants believed would influence a U.S. House of Representatives election based on whether they had to confirm their predictions (cobalt distribution) first or they were allowed to explore (ebony distribution) the final data and make their prediction. Zero-inflated negative binomial model accounting for censoring and skew shows no difference in the region below the maximum (BF = 8.607 in favor of the null) but that having to confirm one’s prediction makes people more likely to admit they thought all 12 factors to influence the election (blogit (475) = .605, p < .001). Black dashed reference line differentiates those at the extreme from the rest of the distribution.
Thus, we can say we have replicated the overall results that predicting the factors influential in an election and being reminded of those prediction causes people to believe more factors are influential than if they had seen the final results first. All of the effect, however, is confined to increasing the likelihood of choosing ‘all’ of the factors. There is no effect on selecting a subset of the choices (anything less than all of the options).
**References**
Long, S. J., Long, J. S., & Freese, J. (2006). Regression models for categorical dependent variables using Stata. Stata press.