Main content



Loading wiki pages...

Wiki Version:
This website serves as the permanent repository for data and code associated with the paper "Monthly excess mortality across counties in the United States during the COVID-19 pandemic, March 2020 to February 2022.", forthcoming in Science Advances. The paper estimates US excess mortality for 3,127 counties over each month of the pandemic from March 2020 to August 2022. This novel dataset can be used by policymakers to inform resource allocation decisions and by researchers interested in investigating the social, demographic, and structural factors associated with excess mortality across communities in the US. The aim of this paper is to estimate monthly deaths at the county level for the United States for the period March 2020 - February 2022 in the counterfactual scenario of no COVID-19 pandemic. We do so with a Bayesian hierarchical model with a flexible time component and a spatial component. To deal with suppression of death counts < 10, we employ a set of state-year censored Poisson models to estimate the censored death counts. We then adjust the total number of imputed deaths making sure that the sum of non-suppressed and imputed deaths equal the state yearly death count (very unlikely to be suppressed). Users interested in the data output from the paper (i.e. estimates of expected and excess mortality) should look into the data/output/estimates folder where we made available a set of csv files with estimates aggregated at different levels. For variables which are uncertain (expected deaths, excess deaths, and quantities derived from these two variables) we provide posterior means, medians, and 90% intervals. Further aggregation should only be attempted with the mean value for each observation. Medians and posterior intervals should not be aggreated (i.e. users should not sum them or take differences). The coding portion of the project consists of several RMarkdown and R files: - cleanMortalityData.Rmd: Cleans the all-cause mortality data. - cleanMortalityWavesData.Rmd: Cleans the all-cause mortality data for each pandemic wave. - cleanCOVIDData.Rmd: Cleans the data for COVID mortality. - cleanCOVIDWavesData.Rmd: Cleans the data for COVID mortality for each pandemic wave. - modelFit.Rmd: Fit the monthly level model for all-cause deaths. - modelSummary.Rmd: Extracts the model's parameters and hyperparameters and creates summary tables and graphs. - crossValidation.Rmd: Runs cross-validation on the model and produces a set of estimates to be used to evaluate the model. - modelEvaluation.Rmd: Evaluates model performance using the cross-validation data. - createSimsDF.Rmd: Creates a dataframe with county-month samples from the model's posterior distribution for death counts. This dataframe is used to construct estimates of death counts with posterior intervals at different levels of aggregation. - estimates*.Rmd: These files (estimatesMonthly, estimatesPandemicYears, estimatesWaves, estimatesStates, estimatesStateMonths, and estimatesMonthlyTotals) produce estimates of expected death counts, excess death counts, and relative excess at different levels of temporal and geographical aggregation. - summaryTable.Rmd: Produces a metro-division level table summarizing the key results by pandemic year. - summaryTableFinal.Rmd: Produces an alternative metro-division level table summarizing the key results for the entire period. - scatterPlots: Produces scatter plots of excess mortality in year 1 (March 2020 - February 2021) and year 2 (March 2021 - February 2022) and of excess mortality against COVID-19 mortality. - timeBarsGraphDivision.Rmd: Produces a graph showing how relative excess evolved during the pandemic for each division, separating large metro areas and other areas. - plotsForSubmission.Rmd: Creates the geofacet plot and the heatmap plot of count-level excess mortality. - map_by_period_relative_exc_to_eugenio.R: Creates maps of relative excess. - countyPlots: Produces a set of county-level plots of observed vs. expected deaths. The files are intended to be run in the order in which they appear on the list. Detailed instructions on the steps we used to download data from the CDC WONDER platform are given in the file within the data/output folder. Aside from the R files within the R folder, we also provide the Python code we used to create monthly population estimates at the county level by interpolating yearly intercensal estimates from the Census Bureau. The Python code is in the form of a Jupyter Notebook. The final population estimates are stored in the output folder. Here is a brief description of the content of the repository: - Python: Contains the Python code used to generate monthly population estimates. - R: Contains the R code used to prepare the data, train the models, assess the models' perfomance, produce the estimates of excess mortality, and produce the figures and tables for the paper. - data/input: Contains all the data needed to train the models and produce estimates of excess deaths. To reconcile inconsistencies across the various data sources we are using, and to accomodate further analysis with a longer backward time frame (i.e. we trained our final models on data for 2015-2019 but initially considered a wider window), we harmonized county FIPS code following the schema in the FIPSFixes.csv file. For reasons of space, We only provide that on all-cause deaths and COVID-19 deaths where COVID is listed as the underlying cause of death. - data/output: Will contain all the data that will be generated as a product of the project. We already included different sets of estimates compliant with the CDC Wonder user agreement. - figures: contains all the figures and tables appearing in the paper.
OSF does not support the use of Internet Explorer. For optimal performance, please switch to another browser.
This website relies on cookies to help provide a better user experience. By clicking Accept or continuing to use the site, you agree. For more information, see our Privacy Policy and information on cookie use.

Start managing your projects on the OSF today.

Free and easy to use, the Open Science Framework supports the entire research lifecycle: planning, execution, reporting, archiving, and discovery.