## Overview - Kosfeld et al. (2005) re-analysis
This page reports on my re-analysis of the data from Kosfeld, M., Heinrichs, M., Zak, P. J., Fischbacher, U., and Fehr, E. (2005), “Oxytocin increases trust in humans,” Nature, 435, 673–676. https://doi.org/10.1038/nature03701.
The data from Kosfeld et al. (2005) was re-analyzed to show the benefits of estimation over decision-making approaches to inference. The re-analysis is discussed in the forthcoming paper by Calin-Jageman & Cumming in *The American Statistician* (in press).
## What will you find here?
* The [reconstructed data from Kosfeld et al. (2005)][1].
* [An R script][2] containing our analysis to confirm proper reconstruction of the data and then re-express the findings using estimation (effect sizes and interval estimates) rather than decision-making.
* [Reconstruction of Figure 1][3] from Kosfeld et al. (2005).
## Our re-presentation of Kosfeld et al. (2005) with the estimation approach:
Our goal was to stay as true to Kosfeld’s approach as possible, using the same models and assumptions, but to summarize the data through effect sizes and uncertainty. It is important to note that regardless of the analytic strategy the data from Kosfeld et al. (2005) is highly uncertain; it is compatible with a wide range of oxytocin effects including the possibility that effects are vanishingly small. The point of our re-presentation is simply to illustrate how differently the same results are judged when summarized with p values compared to with estimates of effect sizes and uncertainty.
Kosfeld et al. (2005) made inferences about the population by using the Mann-Whitney U test (also known as the Wilcoxon test). This tests the equality of the distribution functions for the two groups. Kosfeld et al. (2005) interpreted statistically significant results to indicate differences not only in group distributions but also in group medians (e.g. “These differences in the distribution of trust result in higher average and median trust levels for subjects given Oxytocin”, p. 674). The interpretation of differing group medians requires the assumption that the distributions are symmetrical. For the critical test comparing the placebo and oxytocin in the trust experiment, a one-tailed test was reported (“p = .025 one sided”). All other tests reported were two-sided.
This wiki summarizes our approach. The [analysis script][4] posted here shows the specific R commands used in our re-analysis.
### Reconstruction of original data.
To recover the raw data from Kosfeld et al. (2005) we drew on their Figure 2, which presents histograms of the data from the trust and risk experiments. However, we found that the bar heights in Figure 1A sum to just 26 control participants even though the text of the manuscript reports that there were 29 control participants. Through trial and error we found that imputing 3 additional scores of 10 in the control group of the trust experiment yielded a data set that reproduced all the statistics reported in the main text and Table 1.
The [analysis script][5] posted here re-creates table 1 and Figure 2 from Kosfeld et al. (2005) and other key statistics reported. Overall, the match to the manuscript is excellent with the 3 imputed values.
Based on the strong match with the imputed data, it seems certain that the original Figure 2 was clipped or mis-printed.
![Comparison of Figure 2 from original manuscript and reconstruction for this re-analysis][6]
### Comparing oxytocin and placebo groups in the trust experiment
As Kosfeld primarily focused on median trust, we selected the difference between median trust as our measure of effect size. This was calculated using the Hodges-Lehmann estimator. Technically, this calculates the psuedomedian, but if one assumes the distributions are symmetrical (as Kosfeld et al., 2005 seem to have assumed), then the psuedomedian coincides with the median. For clarity, we elided this distinction in the main text and discussed the effect size simply as a difference in medians. Note, again, that other analytic approaches (e.g. estimating the difference between means with a t test) would still lead to the same conclusions.
There are multiple ways to quantify uncertainty, including Bayesian credible intervals and frequentist confidence intervals. As Kosfeld et al. (2005) utilized a frequentist approach we report frequentist confidence intervals for the difference between medians. This was again based on the Hodges-Lehmann estimator. Calculations were made in R using the wilcox.test function which can also provide the estimated psuedomedian and its confidence interval. To match the stringency of the one-tailed test used by Kosefeld et al. (2005) we calculated and report 90% confidence intervals.
In the text we express the difference between medians in % terms (e.g. a 17% increase in median trust). To obtain % changes we divided the estimated change in location (1) by the median in the placebo group (8).
### Estimated Power for the Non-Trust Experiment
In section 3 we estimate the power Kosfeld et al. (2005) had to detect an oxytocin effect in the non-trust experiment in which participants played a game that involved the same risk but not trust. For the trust experiment, the standardized mean difference in performance was d = 0.47. We estimated the power to detect this effect size in the non-trust experiment, which involved 61 total participants and was analyzed with a two-tailed test with an alpha of .05. For an independent samples t-test, power would be .45 if all assumptions were perfectly met. Kosfeld et al. (2005) actually used a non-parametric test. Parametric tests can lose power dramatically with violations of assumptions whereas non-parametric tests are more robust. However, our estimated power of .45 is for a t-test under ideal circumstances and thus is optimistic for the analysis strategy utilized by Kosfeld et al. (2005). Although the ideal way to estimate a priori power for the non-trust experiment could be debated, it seems unequivocal that the experiment did not have a sample-size sufficient to regularly detect the expected effect of oxytocin, and thus the negative results are unconvincing as a test for specificity of an oxytocin effect.
### Drug x Task Interaction
Kosfeld et al. (2005) made a number of comparisons between groups in the trust and non-trust experiments. They did not, however, test the critical interaction between drug and task. To do this with the non-parametric approach u
tilized in the original study we used the aligned rank transform test developed by Higgins & Tashtoush (1994), implemented as the aligned.rank.transform function from the ART package in R. This indicated a non-significant interaction, p = .23. An ANOVA test for an interaction also indicated a non-significant interaction between group and task, p = .20.
[1]: https://osf.io/8h6ut/
[2]: https://osf.io/3xc8m/
[3]: https://osf.io/rkv7c/
[4]: https://osf.io/3xc8m/
[5]: https://osf.io/3xc8m/
[6]: https://osf.io/rkv7c/download