Materials
-------------
Data was collected for a cross-sectional community-level study in 2009. Information concerning socioeconomic factors, community demographics, H1N1 influenza-levels, vaccination rates, and azithromycin use was collected from 400 Canadian communities. The present dataset has been compiled from a previous dataset utilized for a study on risk factors for antimicrobial use. For confidentiality purposes communities are identified solely by number and no data concerning regional trends can be provided.
This dataset included the outcome of interest as the amount of azithromycin consumed by a community per 10 000 people in 2009. Age was represented in two variables: (1) whether the community has an upper quartile of children under 5, and (2) whether the community has an upper quartile of adults over 65. Density of healthcare workers was represented by the number of healthcare workers per 10 000 community members. Socio-economic status was represented by the percentage of households classified as low-income (20% of income is spent on basic necessities). The flu level was represented as high, moderate, or low based on national community measures. The vaccination level was represented by whether the community had a high vaccination rate based on national community measures.
Causal Diagram
--------------
A causal diagram was constructed to determine potential associations between community-level data and azithromycin use (see [Figure 1][1]). Based on previous literature, direct associations between azithromycin use and age, density of health care workers, and H1N1 influenza-level were anticipated. Azithromycin use was expected to be increased in the under 5 year olds and over 65 year olds, when a community was underserviced with regards to the number of health care workers, and in communities with a high flu-level. Vaccination level was expected to influence H1N1 influenza-level (higher vaccination levels being associated with lower flu-levels) and be influenced by age, income and density of health care workers. A higher level of vaccination would be expected among those under 5 years old and those over 65 years old and in areas with an adequate number of health care workers. Age was also identified for a potential relationship between income and flu-level with lower income being hypothesized in communities with a high percentage of under 5 year olds or over 65 year olds and an increased flu-level also being expected in these communities due to weaker immune systems among these age groups.
Predictors of interest
----------------------
Based on [Figure 1][2], predictors of interest were identified to aid in statistical design and model building. Predictors of interest included age (high percentage of under 5 year olds and/or high percentage of over 65 year olds), density of health care workers, and H1N1 influenza-level. These variables were identified as predictors of interest based on their theoretical direct association with azithromycin use as identified by [Figure 1][3]. Subsequently, potential confounders were also identified based on [Figure 1][4]. Potential confounders included age for H1N1 influenza-level.
Model Building and Statistical Analysis
---------------------------------------
**Statistical Analyses**
All statistical analyses were performed using Stata Intercooled 13.1. To meet the linear model assumptions of normality and homoskedasticity, the outcome variable of azithromycin consumption had previously been transformed using a log 10 transformation. A linear regression model was utilized to test the predictors of interest; age, density of health care workers, and H1N1 flu-level on azithromycin use.
***Univariable analyses***
Initial model construction began through the completion of a univariable analysis. Descriptive statistics were created for each predictor variable, followed by determination of unconditional associations between independent variables and azithromycin consumption. For univariable analysis, income level and density of health care workers were treated as continuous variables while flu-level, high number of under 5 year olds, high number of over 65 year olds, and vaccination-level were treated as categorical variables. All categorical variables were dichotomous with the exception of flu-level which was trichotomized into high, moderate and low. Significance in the univariable model was identified by using a liberal p-value of 0.25.
Collinearity between predictor variables were tested using Pearson correlation analysis, where correlation values > 0.8 were considered collinear. Continuous variables that were identified as potential explanatory variables were assessed for linearity with the outcome variable using a Lowess smoother and Lintrend. Continuous predictor variables without a linear association with the outcome variable were checked for a statistically significant quadratic relationship (p ≤ 0.05) through the introduction of a quadratic term and graphical assessment using a Lowess smoother. Continuous predictor variables that do not show a significant linear or quadratic relationship with the outcome were categorized.
***Multivariable Analyses***
All significant explanatory variables were placed in a multivariable model and manual stepwise elimination was completed using a p-value ≤ 0.05 to indicate significance in the model. Before any variable was removed from the model, it was evaluated for confounding. Confounding variables were defined as a ≥ 20% change in a remaining model coefficient with removal of the variable. All confounding variables were to stay in the final multivariate model. If the variable was not significant at a p-value ≤ 0.05, and it was not significant or a predictor of interest, and it was not a confounding variable, it was tested with an F-test to evaluate whether it could be removed from the model. An F-test was considered significant at a p-value ≤ 0.05. Interaction terms were assessed for statistical significance at a p-value ≤ 0.05 within the subsequent multivariable model. Interaction terms that were not significant were assessed using an F-test to evaluate whether it could be removed from the model. An F-test was considered significant at a p-value ≤ 0.05.
***Outlier and Residual Analyses***
We assessed our final model for collinearity using the variance inflation factor (VIF), where a VIF of > 5 that was not due to construction would be considered collinear. The assumptions of linear regression were assessed for our final model. Homoskedasticity of residuals was assessed using the Cook-Weisberg test for heteroskedasticity, where a p-value > 0.05 would be considered homoscedastic. Unusually influential points were detected using dfbeta values. Outliers were evaluated visually through standardized residual plots and through their standardized residual values, and were defined as outliers if their absolute standardized residual value was greater than 3. Outliers were examined for potential recording errors, and those without recording errors will remain in the final model. The normality of the residuals was examined visually through a Q-Q plot and through a Shapiro-Wilk test, where p-value ≤ 0.05 would mean the normality assumption is not automatically accepted. However, with a large sample size, the Shapiro-Wilk test is sensitive, and may provide a significant result rejecting normality when the normality assumption is not violated enough to impact the use of the model. Therefore, outcome variable transformations to address Shaprio-Wilk significance will only be conducted if they: (1) do not increase heteroskedasticity, and (2) the Q-Q plot is visually evaluated to be not normal.
[1]: https://osf.io/vsx9m/ "Figure 1"
[2]: https://osf.io/vsx9m/ "Figure 1"
[3]: https://osf.io/vsx9m/ "Figure 1"
[4]: https://osf.io/vsx9m/ "Figure 1"