Main content

Home

Menu

Loading wiki pages...

View
Wiki Version:
To identify realistic values for test-retest and internal consistency reliability, we performed a reliability generalization of the 1916 Stanford-Binet (the most common test used to identify gifted children for the Terman longitudinal study). A Google Search for the terms "Stanford-Binet" and either "reliability" or "stability" in works published between 1916 and 1936 was performed. Every article in the search results was examined for: - The type of reliability coefficient (test-retest or internal consistency) - The value of the reliability coefficient - The sample size - For test-retest coefficients: the interval between testing - For split-half coefficients: whether the reliability estimate was corrected with the Spearman-Brown formula. After articles were identified and coded, a bare-bones reliability generalization was used to meta-analyze the reliability values. Correlation values were converted to z-values with Fisher's r-to-z transformation before any analysis. The average of the split-half z-value (after Spearman-Brown corrections) was calculated after weighting each sample by its sample size. Test-retest z-values were weighted by sample size and averaged. A meta-regression was also performed with the testing interval for each sample as the independent variable and the test-retest correlation z-value as the dependent variable. All results were converted back to r values with a z-to-r transformation. A few notes: - All subgroups were recorded separately, as long as the observations were independent. If observations were not independent, then these were not included in the reliability generalization. For example, Terman 1919 (p. 142) and Irwin and Marks (p. 98), tested examinees more than twice. In these studies, the authors averaged the first and second testings' with the same examinees' first and third testings. These studies are not included in the reliability generalization because of this non-independence. - Whenever raw data were reported, the reliability coefficient was re-calculated to ensure accuracy. These re-calculations are noted in the Excel file. - Testing intervals were recorded as an average whenever possible. If the mean was not available, then the median for the testing intervals was used. When this was not available, then the midpoint of the range of testing intervals was used. Results: - For test-retest reliability, the overall mean correlation (weighted by sample size) is .822 (95% CI = [.595, .933]), with a weighted average of 23.57 months (SD = 18.77 months) between testings for 5,019 examinees. There is a curvilinear relationship between the interval between testings and the correlation between test scores. That’s typical. The correlation levels off at about 5 years between testings (r = .726) and as the extrapolation extends from that time point, the quadratic regression line indicates that the correlation increases, but that’s not realistic. A Loess regression line levels off at about r = .75 at 5 years between testings. Therefore, realistic simulation values for test-retest reliability from childhood to adulthood are .60 to .75. This would match Deary et al.’s (2004) test-restest correlations of .63 to .66 from ages 11 to 77 or 80. After restriction of range is corrected for, Deary et al.’s (2004) correlations increased to r = .73. - For internal consistency, there are data from 615 examinees. All were calculated with some variation of the split-half method because KR20 and Cronbach’s alpha were not invented yet. The average split-half correlation was r = .821 (95% CI = [.523, .940]). Therefore, realistic internal consistency values are .70 to .90. - Both sets of reliability coefficients have outlier samples where the correlation was negative. Eliminating these changes results slightly, but not much. Results reported here are based on data that include all samples and do not eliminate any outlier samples. References Deary, I. J., Whiteman, M. C., Starr, J. M., Whalley, L. J., & Fox, H. C. (2004). The impact of childhood intelligence on later life: Following up the Scottish Mental Surveys of 1932 and 1947. Journal of Personality and Social Psychology, 86, 130-147. doi: 10.1037/0022-3514.86.1.130
OSF does not support the use of Internet Explorer. For optimal performance, please switch to another browser.
Accept
This website relies on cookies to help provide a better user experience. By clicking Accept or continuing to use the site, you agree. For more information, see our Privacy Policy and information on cookie use.
Accept
×

Start managing your projects on the OSF today.

Free and easy to use, the Open Science Framework supports the entire research lifecycle: planning, execution, reporting, archiving, and discovery.