**Economic Wellbeing Index Construction**
**************************************************************
In the IMLS report "Understanding the Social Wellbeing Impacts of the Nation's Libraries and Museums" (Norton, et al, 2021), the Economic Wellbeing Index brought together the following indicators: "educational attainment (percentage of residents over age 25 with less than a high school degree; percentage of residents over age 25 with bachelor's degree or higher); median household income; share of households with investment income; per capita income; poverty rate; share of residents over age 16 in the labor force".
In this Libraries in Community Systems, we will include a few additional measures, including measures of equity along Census disaggregated categories (race, gender, single parent status). These additional measures were encouraged by LinCS partners when reviewing their own definitions of social wellbeing. In each category, partners stated that measures definitionally had be equitable or show equitable distribution in order for it to be "good" or for the community to be considered "well". Partners also requested data from 2010 forward (are at least most recent 5 years available) so that they could see trends and changes in their communities over time. Our partners are library people, so these indexes are also constructed with how they will fit with library data. For these reasons this index will be constructed as follows:
- Two files will be produced, 1 at tract and 1 at county boundaries
- 2019 (2015-2019 5-year ACS estimates) is the exploratory factor analysis year, but all years with consistent library measures are included, so 2013 forward
Replicating, testing, and expanding the Economic Wellbeing Index will follow these broad steps:
1. Harmonize variables in the American Community Survey 5-year estimate table reporting years over the base study period (2013-2019)
2. Keep and destring variables needed for the index, as well as any additional disaggregation measures which could illuminate levels of equity
3. Calculate proportions and differences needed for analysis
4. Standardize all variables that will be used in exploratory factor analysis, pooling the standardization sample at the national level, rural/urban category at the national level, and rural/urban category within the region level
5. Conduct exploratory factor analysis, including tests of sampling adequacy and model fit for both replicated and expanded indexes
6. Construct replication and expanded index at each standardization level
I begin with ACS table B01003 as the basis of all population adjustments. Educational attainment measures come from ACS table DP02 with disaggregated measures from S1501. Unfortunately, S1501 doesn't begin reporting attainment disaggregated by race and ethnicity until 2015, so these measures couldn't be incorporated into the index. They will be in supplemental tables in our library data dashboard in July.
Aggregate median household and per capita income, as well as poverty and labor force participation rates were all taken from ACS table DP03. Disaggregations of median household income can be found in S1903, per capita income in S1902, and poverty in S1701. Disaggregation of labor force participation more closely matched our partner definitions of "economic diversity" so we hold that data for that index. Portion of households with investment income is taken from table S1902.
Because all data here are taken from the same Census Bureau source, the only time intensive steps were harmonizing variable names over the years so the correct data was kept and renamed. See the .do or mark down file for variable mapping over years.
**Calculating Differences**
There exists a literature exploring methods of rigorously measuring discrimination. Here, we are not making claims that one location is more or less racist or discriminatory than another. Instead, our measures of place-based experience. Would a person living in this place observe generally equal conditions of wealth and poverty across population subgroups? For these measures we follow the calculation used in "[In the Red: The US Failure to Deliver on a Promise of Racial Equality][1]", but at more precise geographies without their strict inclusion criteria (Lynch, Bond, Sachs, 2021).
Example code for this process reads as: `gen ``var'_dif = ``var'- inc_mhi_white_e`
Read in human, I generate a new variable from each categorical variable like inc_mhi_black_e (income, median household income, Black respondents, estimate), adding the dif suffix to indicate that this is the aggregate MHI for Black households minus aggregate MHI for white households. Large negative numbers would indicate that, in aggregate, Black households have lower income than white households. In the library-index paired dataset, there is wide variation.
Only 25% of our library observations have matched data for median Black household income, so this measure isn't incorporated into our index. Of places where we have this data, the mean income for Black households is $43,464 and $53,764 for White households. The average difference is -$10,078 (median is $12,685). But the range for difference in averages is from -$203,346 to $198,939!
Once all calculations are completed, geographic boundaries are merged into the data: RUCA commuting ranges as rural/urban categories and OBEREG Bureau of Economic Analysis region. These categories will be used to create standardization and analysis groupings.
**Generate Z-scores**
I use Stata command `zscore` to create z-scores at 3 population sample groupings:
- prefix z_ for standardization using the full national sample
- prefix zden for standardization using within density category at the national scale sample
- prefix zrden for standarization using within density category at the region scale sample
These z-scores are population adjusted following the method used in "Understanding the Social Wellbeing Impacts...", we simply subtract the population z-score from the variable of interest z-score to derive the population adjusted measure. Standardizing variables takes place before conducting exploratory factor analysis.
**Exploratory Factor Analysis**
Testing the tract level indicators included in the original Economic Wellbeing Index have a Kaiser-Meyer-Olkin measure of sampling adequacy of .795, a just well enough fit. Including difference measures increases the index fit. After iterative tests, I select three poverty difference measures in place of the general poverty rate included in the orginal index:
- female minus male
- Hispanic minus white non-Hispanic
- HS or equivalence graduate minus Bachelor's degree holder
**Constructing the Indexes**
Indexes are a simple summation of the variables tested z-scores, following this code pattern: `gen znat= - z_ed_nohs_pe + z_ed_ba_pe + z_inc_mhi_e + z_inc_invest_pe + z_inc_pcap_e + z_lf_16plus_pe + z_pov_fem_ptdif + z_pov_hisp_pe_ptdif + z_pov_edhs_ptdif`
In human this says generate a new variable called znat equal to the summation of the listed variables. I create 6 indexes that follow this pattern: one for each level of standardization and either as a strict replication or inequality inclusive index. Indexes that include only the variables in the original study have the suffix _rep.
After merging these data with public library administrative entity records, we have 64,454 records over 2013-2019.
Lynch, A., Bond, H., Sachs,
J., 2021. In the Red: The US Failure to Deliver on
a Promise of Racial Equality. New York: SDSN.
[1]: https://static1.squarespace.com/static/5dadc6c4073ce72706cd29c6/t/6092b4f78863b5714bf14a09/1620227327263/In+The+Red-The+US+Failure+to+Deliver+on+a+Promise+of+Racial+Equality.pdf