Main content

Home

Menu

Loading wiki pages...

View
Wiki Version:
**Educational Effectiveness Index** Educational or school effectiveness is a factor measured in "Understanding the Social Wellbeing Impacts of the Nation's Libraries & Museums" with "average standardized math and reading scores on states assessments; graduation rates; dropout rates (share of 16- to 19-year-olds without a high school degree and not attending school)" (Table A1, [Technical Appendix][1], page 3). Themes from library partner survey responses defining school effectiveness were: - educational opportunities and growth; - "all students"; - life/"real world" preparation at graduation (executive functioning); - student confidence, belonging, and safety; - school-community programs for student wellness & success (honor community practices and/or close for holy/sacred/feast/ceremonial days?) The survey asked respondents to make data recommendations for their measures of interest. Proxies they listed were graduation rates, dropout rates, attendance, trade school/ apprenticeship / college acceptance and graduation rates, literacy rates / test scores, employment rates, scholarship availability, number of teachers compared to community size, teacher-student ratio, presence of art/enrichment/music/after-school programs, school safety, teacher & stduent satisfaction survey results, teacher tenure (as in how long worked at that school) or attrition, teacher preparation/education/certification; age of curriculum, parent inclusion at school. Our partners want education or school effectiveness to be inclusive of the health and attachment students have to themselves, community, and culture. Some of the data they recommend, like teacher-student ratios, don't describe proficiency outcomes or belongingness outcomes. Other, more aligned, data indicators like those collected and made available by the Civil Rights Data Collection (CRDC), have very incomplete coverage. While CRDC won't be used to develop the index, it will be translated for library tool users in July 2024. **Data Sources** - Average Cohort 4-year Graduation rates: EDFacts at school district boundary and County Health Rankings at county boundary - Standardized math and reading scores - Stanford Education Data Archive (SEDA) CS long at school level - Dropout rates (share of 16- to 19-year-olds without a high school degree and not attending school)"- Census table B14005; Measures not included in Index - Educational growth and learning gains- SEDA CS long at district level disaggregated by sub categories (*these pool over years and are time invariant - they don't work within our panel models*) and CRDC retention tables; - School practices and educational opportunities - Civil Rights Data Collection (CRDC) advanced class participation - advanced math, physics, AP, G&T programs tables; - Attendance- CRDC chronic absenteeism table; belonging & safety- CRDC sport participation, harrassment, referrals & arrests, suspension tables **Average Cohort 4-Year Graduation Rate (ACGR)** *School District (LEA) Boundary* This index uses a dataset created as part of a propensity score matching study of rural school districts which required graduation rate imputation. To protect the privacy of students, school district ACGR in graduating classes of less than 100 students is blurred from 5 points to totally suppressed, depending on cohort size. Imputation of these blurred values was conducted using multiple interval imputation, which takes available data and conducts iterative bounded regressions. The data used for regression were values included in the Elementary and Secondary Information System (ELSi), bounded by the blur interval given in the EdFacts graduation rate (80-84%, for instance). From that dataset, I keep only district identifiers, the agency type (categorical variable indicating if the local education agency (LEA) is public, unified, state run, private, etc.), locale (categorical variable indicating the level of urbanness or rurality of the district), ACGR value given in EdFacts, denominator used to get the rate, gradrate values which are not imputed, blur, lower and upper bounds given by interval, and the imputed graduation rate (gradrate_mi). The index study is for years 2013-2019, so I retain only these observations. *County Boundary* [County Health Rankings & Roadmaps][2] (CHR) from the University of Wisconsin Population Health Institute is the source I use for ACGR pooled on the county boundary. Unfortunately, CHR only takes this measurement every other year, and there is lagged reporting alignment. I harmonize my reporting years by mapping CHR into the index data: Graduation year 2018 = CHR year 2021 Graduation year 2017 = CHR year 2019 Graduation year 2015 = CHR year 2018 Graduation year 2013 = CHR year 2016 Graduation year 2019 = CHR year 2020 Graduation year 2016 = CHR year 2017 Graduation year 2014 = CHR year 2016 (this is a duplication of gradyear 2013 which required code in Stata `expand` to generate harmonized observations. *Standardized Math & Reading Scores* Both LEA and County bounded scores come from the SEDA data, for more years than are included in this study, so drop observations for years before and after 2013-2019. These data are formatted long (each grade-subject combination is a new observation) but I want observations that collect on the school district - year. I split the file into 4 files: 3rd grade Reading, 3rd grad Math, 8th grade Reading, 8th grade Math. After renaming the score variables with a subject-grade prefix, they are merged together again. The SEDA tables are the only data in this index which are available disaggregated by population sub-group: reported sex, race, and economic disadvantage. In these files the difference is reported as a gap calculated as Male - Female, White - Other race/ethnicity, and Not Economically Disadvantaged - Disadvantaged. All of these variables are retained, but only a few are widely available across school districts. **Portion of Youth Unattached to School** Census American Community Survey table B14005 gives 5-year estimates of number of 16-19 year olds within a geographic boundary, and the number of 16-19 who are neither enrolled in high school or HS diploma (or equivalence) holders. *School District (LEA) boundary* While SEDA is given on the geographic boundary, that data boundary isn't available for table B14005. Instead, I merge together total population, population 16-19 years old, and unattached to school population from Unified and Secondary school district bounded data files for each year 2013-2019, and then append the resulting files. *County boundary* Table B14005 has excellent county level coverage. Each downloaded year is appended together, after dropping all but total population, population 16-19 years old, and unattached to school population. **Variable Treatment, Calculation, and Transformation** Both LEA and County files receive the same data treatment, including merging in additional geographic identifiers like RUCA (county) and Locale (LEA) designations of rural/urban category, and OBEREG (Bureau of Economic Analysis) code for region in the US the state is in. The categorical variable Density is created to harmonize between RUCA and Locale codes: Locale 11-13 = RUCA 1-2 Locale 21-23 = RUCA 3-5 Locale 31-33 = RUCA 6-7 Locale 41-43 = RUCA 8-9 This section of the attached Stata .do file is destringing variables for calculation, standardization, and analysis. Each merged file is investigated for mismatched records and duplications. Once all variable data is cleaned to reduce instances of missing categorical or value data, they are standardized using z-scores 3 population sample groupings: - prefix z_ for standardization using the full national sample - prefix zden for standardization using within density category at the national scale sample - prefix zrden for standarization using within density category at the region scale sample These z-scores are population adjusted using the population given in Census ACS table s2405 and merged in from the other social wellbeing indexes. Following the method used in "Understanding the Social Wellbeing Impacts...", we simply subtract the population z-score from the variable of interest z-score to derive the population adjusted measure. Standardizing variables takes place before conducting exploratory factor analysis. **Exploratory Factor Analysis** I begin by testing the fit (Kaiser-Meyer-Olkin test of Sampling Adequacy) of the variables used in the replicated study's index, but at LEA level. Generally, anything above .8 is interpreted to mean that the data in the variables is correlated enough with enough variation, to create a reliable index. I ask Stata to subject test score, graduation rate, and portion of unattached youth to factor analysis without constricting the number of potential indexes (`factor`)and conduct a matrix rotation after (`rotate`). There is clearly only one factor - these data indicators are strongly correlated all together, without an underlying pattern of division. I follow the same steps including the proficiency gaps between male and female students, and non-economically disadvantaged and disadvantaged students in the testing and analysis. After exploring fit of different combinations of disaggregation, I choose male-female gap in proficiency because it has the fewest missing observations. Indexes are a simple summation of the variables tested z-scores, following this code pattern: `gen nat_sei= z_gradrate - z_lefths_pe -z_rla8cs_mn_mfg - z_mth3cs_mn_mfg` In human this says generate a new variable called nat_sei equal to graduation rate minus drop out rate minus 8th grade reading score gender gap minus 3rd grade math score gender gap. I create 6 indexes that follow this pattern: one for each level of standardization and either as a strict replication or inequality inclusive index. Indexes that include only the variables in the original study have the suffix _rep. **Combine SEI with PLS data** For simplicity, I combine the school district LEA data with the public library system based on the location of the the system's administrative address. There are likely places where this doesn't match reality. County data gets a more complicated handling. The attached data file has variables with m prefixes: m= mean county data mn= minimum of the counties with outlets in them mx= maximum of the counties with outlets in them For library systems with outlets in multiple counties, for each School Effectiveness Index, there is a minimum value among the counties served, a maximum, and a mean, or average. It is the average that is used in analysis, but libraries interested in knowing the range of outcomes within their service area have that information available to them. The dataset attached to this component has fairly consistent overage over each of the 7 study years, totaling 64,430 observations. See the data codebook for variable definitions and summary statistics. [1]: https://www.imls.gov/sites/default/files/2021-10/swi-appendix-i.pdf [2]: https://www.countyhealthrankings.org/health-data/methodology-and-sources/data-documentation/national-data-documentation-2010-2022 [3]: https://drive.google.com/file/d/1PsGUtv_hK1Ka1tE228KXJqkQ_RQhAF_W/view?usp=sharing
OSF does not support the use of Internet Explorer. For optimal performance, please switch to another browser.
Accept
This website relies on cookies to help provide a better user experience. By clicking Accept or continuing to use the site, you agree. For more information, see our Privacy Policy and information on cookie use.
Accept
×

Start managing your projects on the OSF today.

Free and easy to use, the Open Science Framework supports the entire research lifecycle: planning, execution, reporting, archiving, and discovery.