Search Strategy, Inclusion Criteria, and Data Extraction
--------------------------------------------------------
Our review adheres to the PRISMA and Meta-analysis of Observational Studies in Epidemiology (MOOSE) reporting guidelines (see [Texts S1][1] and [S2][2]) [29]–[31]. The methods protocol is available in [Text S3][3]. A study investigator (ECS) and two research assistants (Rachel Stelmach [RS] and Claire Still [CS]) systematically searched PubMed, Embase, Web of Science, and LILACS for relevant articles from inception to October 28, 2013. We also indexed relevant studies from the bibliography of reviews by Ziegelbauer and colleagues [26] and Asaolu and Ofoezie [32]. Abstracts without published articles were considered eligible for inclusion. Additionally, we requested available unpublished research from the US Centers for Disease Control and Prevention, The Carter Center, The Task Force for Global Health, the WHO regional offices, and the authors' personal collections.
The native search engines within PubMed, Embase, Web of Knowledge, and LILACS were used to search each respective database using Boolean operators. The search included two clusters of terms: one for STH (i.e., helminth, soil-transmitted helminth, geohelminth, ascaris, lumbricoides, trichuris, trichiura, hookworm, ancylostoma, duodenale, necator, americanus, strongyloid*, stercoralis) and one for WASH (i.e., sanitation, sanitary engineering, water supply, waste management, environment*, excre*, faec*, fecal, feces, hand washing, handwashing, hygiene, latrine*, toilet*, water, soap). Results had to contain at least one term from both clusters. “Extensive search” was enabled when searching with Embase. Because Embase only allowed for exporting up to 5,000 records, results were stratified by date in order to screen and export all results in smaller segments. All search records were exported to bibliographic files and imported into Endnote X5 (Thomson Reuters), which was used to manage and screen search results. Titles, and when available, abstracts were scanned by an investigator (ECS) and also independently by research assistants (RS and CS) to determine possible relevance. Final selection was based on the full text of all potentially applicable articles. Ambiguous articles were examined by a senior reviewer (MCF).
Publications in all languages were considered. Studies in English, Spanish, Portuguese, and French were screened by investigators directly. Chinese-language articles were reviewed by a study collaborator (Shuyuan Huang [SH]) who assessed eligibility and extracted relevant data for the research team. Relevant data from all eligible studies was abstracted by a reviewer (ECS) and independently by assistants (RS and CS). Extracted data included study design, setting, year, population characteristics, WASH components measured, diagnostic approach, STH species, and relevant effect measures. Odds ratios (ORs) served as the primary effect measure in the reviewed literature. We collected both crude and adjusted estimates if available. Excel 2007 (Microsoft) was used to input and manage data using a long format to accommodate multiple effect estimates per study.
An article was eligible for inclusion if it presented a measure of effect between WASH and STH (e.g., an OR). For studies that pooled multiple intestinal parasites (e.g., Giardia intestinalis and STH) into one outcome measure, we contacted authors to request disaggregated data. We did not exclude studies based on methodology or population characteristics. Studies that evaluated multiple WASH components were included, as long as the components could be assessed separately from deworming medications and other non-WASH interventions.
There are few standard definitions for WASH access and practices, and it is difficult to measure WASH behaviors objectively [33]. We were unable to consistently connect water and sanitation variables reported in retrieved studies to the WHO and UNICEF Joint Monitoring Program's water and sanitation ladder definitions [34],[35]. For this review, “treated water” is defined as the use of any chemical or physical treatment of water to change its potability, whether conducted at the source or at the point of use. Two specific forms of treatment included boiling and filtering water at home. “Piped water” describes access to, or use of, water collected from a piped infrastructure, regardless of where the water is accessed (public/private) or how well maintained the infrastructure may be. “Sanitation access” was our primary sanitation exposure, defined as access to, or use of, any latrine. We did not exclude studies that lacked information about latrine quality, so access to sanitation could refer to anything from simple pit latrines to flush toilets. For hygiene, “washing after defecation” refers to the availability of handwashing resources (e.g., a wash basin) near sanitation facilities or reported handwashing behavior after defecation. “Soap use or availability” could refer to washing with water alone or no washing as the comparison group. Further, these definitions do not incorporate any criteria for compliance or consistency, since such details were rare in retrieved literature.
Statistical Methods
-------------------
We conducted meta-analyses for groups of effect estimates that related similar WASH access or practices (e.g., latrine availability and/or use became “sanitation access”) to a common outcome. Potential outcomes included infection with a specific STH (i.e., A. lumbricoides, T. trichiura, hookworm, and S. stercoralis) or any STH generally. Note that “any STH” reflected infection with an individual species or co-infection with multiple species when authors reported aggregated STH infection results. Meta-analyses were performed for groups of independent effect estimates that numbered three or greater and shared a similar exposure and infection outcome. A study that measured several WASH components could contribute to multiple meta-analyses, but could only supply one effect estimate for any single meta-analysis.
We employed random-effects models to account for the expected heterogeneity between studies [36]. Only adjusted estimates were utilized to limit the impact of confounding on pooled effect measures [37]. When necessary, we inverted estimates to reflect the effect of WASH, rather than the absence of WASH. This inversion was necessary in order to ensure enough study estimates were available for meta-analysis, but could have resulted in additional heterogeneity. For example, the inverse of “no sanitation access” may be similar to, but distinct from, “sanitation access” when assessed by questionnaire due to bias associated with socially desirable responses. Further, the presence of WASH access or practices may not necessarily be the same as the inverse effect of their absence, especially if important confounders or effect modifiers remain unexplored. Estimates of effect not included in meta-analyses were summarized in the text. The meta-analysis package MAIS for Stata version 12 (StataCorp) was used to perform the random-effects meta-analyses with the DerSimonian and Laird method [38]. The natural log of reported ORs was the dependent variable. CIs use the 95% level unless otherwise noted.
Bias Assessment and Evidence Quality
------------------------------------
We used the GRADE framework to assess potential sources of bias within studies and determine overall strength of evidence for each meta-analysis [39]. The GRADE approach is used to contextualize or justify intervention recommendations with four levels of evidence quality, ranging from very low to high. These levels correspond to how likely it would be for further research to alter conclusions drawn from the current evidence. “High quality” suggests that it is very unlikely for conclusions about effect estimates to change, whereas “very low quality” suggests that any estimate of effect is highly uncertain [40]. We formed our key bias categories from the literature, GRADE recommendations [41], and two instruments highlighted by the Cochrane Collaboration [42]: the Downs and Black tool [43] and the Newcastle-Ottawa scale [44]. We focused on five potential sources of bias in our assessment of individual studies: (i) diagnostic approach for assessing STH infection; (ii) exposure assessment; (iii) confounding assessment; (iv) response rate; and (v) selective reporting. Each study received one of three rankings for each source of bias: low risk, unclear risk, or high risk. Detailed criteria for these categories are available in [Table 1][4]. Bias was assessed independently by ECS and one of the two research assistants (RS and CS), compared, and reviewed by a senior assessor (DGA or MCF) if necessary.
We assessed the overall quality of evidence for each meta-analysis after considering seven key characteristics. Each meta-analysis could receive a quality grade of very low, low, moderate, or high [45]. Meta-analyses of observational studies were classified as “low” by default, but could be downgraded (because of imprecision, indirectness, inconsistency, publication bias, and potential confounding) or upgraded (because of magnitude of effect, dose-response relationship, and potential confounding) on the basis of the overall strength of the evidence.
Inconsistency (i.e., heterogeneity) was assessed with Moran's I2 and Cochran's Q-test [46]. I2 provides an estimate of the proportion of variability in a meta-analysis that is explained by differences between the included studies instead of sampling error [47]. If a study exhibited an I2 value over 50%, there was potential cause for concern, and the Q-test was also checked for a p-value less than 0.10. Values for I2 over 70% or Q-test p-values lower than 0.05 resulted in the automatic downgrading of a body of evidence.
Publication bias was assessed through a visual inspection of funnel plots, though Egger's test also informed our interpretation [48]. Evidence quality was downgraded due to “imprecision” if the pooled effect estimate's 95% CI overlapped with the null (i.e., statistical significance at the 0.05 level). Although we provide CIs for pooled point estimates, imprecision remains a valuable criterion since not all consumers of reviews understand the importance of CIs and statistical uncertainty.
Evidence quality was upgraded owing to large magnitude of effect if the meta-analysis yielded a pooled OR less than 0.33 or greater than 3.0 [41]. Traditionally, risk ratios (RRs) are considered to show a large magnitude if they are less than 0.5 or greater than 2.0. However, ORs overstate the effect size compared to RRs, especially when initial risk (i.e., the prevalence of the outcome of interest) is high [49]. Because STH infection is relatively common, a more conservative threshold was needed for ORs in order to qualify as a large magnitude of effect.
Evidence quality could also be upgraded or downgraded on the basis of any unaccounted sources of potential confounding that would likely have a predictable direction on the effect estimate. For example, hygiene behaviors are typically over-reported in surveys, which could reduce the measured strength of effect for hygiene practices since the exposure group includes those who did not practice hygiene [50]–[52].
Due to the breadth of the review, indirectness was not a common concern, but would be more important for future reviews that focus on specific populations, settings, or interventions. Dose-response relationships were assessed by examining studies where exposures were discretized into ranked categories, e.g., analyzing “always washes hands” versus both “sometimes” and “never.” A dose-response relationship was considered possible if the point estimates improved between the ordinal categories, especially if relevant CIs did not overlap. Additional details about the meta-analysis GRADE criteria are available in [Table 2][5].
[1]: https://osf.io/pyqu9/ "PRISMA checklist"
[2]: https://osf.io/at6ds/ "MOOSE checklist"
[3]: https://osf.io/qhnf6/ "Original methods protocol"
[4]: https://osf.io/d9wja/ "Table 1"
[5]: https://osf.io/gz9a4/ "Table 2"