Abstract
BackgroundPediatric diarrhea, a leading cause of under-five mortality, is predominantly infectious in etiology. As many putative causal agents are zoonotic, animal exposure is a likely risk factor. To evaluate the effect of animal-related factors on moderate to severe childhood diarrhea in rural Kenya, where animal contact is common, Conan et al. studied 73 matched case-control pairs from 2009-2011, collecting rich exposure data on many dimensions of animal contact. We review the challenges associated with analyzing moderately-sized datasets with a large number of predictors and present two alternative methodological approaches.Methodology/Principal findingsWe conducted a simulation study to demonstrate that forward stepwise selection results in overfit models when data are high-dimensional, and that p values reported directly from the data used to train these models are misleading. We described how automated methods of variable selection, attractive when the number of predictors is large, can result in overadjustment bias. We proposed an alternative a priori regression approach not subject to this bias. Applied to Conan et al.’s data, this approach found a non-significant but positive trend for household’s sharing of water sources with livestock or poultry, child’s presence for poultry slaughter, and child’s habit of playing where poultry sleep or defecate. For many predictors evaluated few pairs were discordant, suggesting matching compromised the power of this analysis. Finally, we proposed latent variable modeling as a complimentary approach and performed Item Response Theory modeling on Conan et al.’s data, with animal contact as the latent trait. We found a moderate but non-significant effect (OR 1.21, 95% CI 0.78, 1.87, unit = 1 standard deviation).Conclusions/SignificanceAutomated methods of model selection are appropriate for prediction models when fit and evaluated on separate samples. However when the goal is inference, these methods can produce misleading results. Furthermore, case-control matching should be done with caution.
Highlights
Diarrheal disease is the leading cause of pediatric malnutrition and the second leading cause of under five mortality, with over 1.7 billion childhood cases and more than 500,000 under-five deaths each year [1]
In forward stepwise selection applied to 100 random case-control samples (N = 146, P = 34), in which the crude model presented in Table 2 of Conan et al [7] is true, the median number of Bovine defecated in cooking area Child present during chicken butchering Adult cats present but do not sleep in living area Adult cats sleep in living area
With our proposed a priori approach, none of the effect estimates reach statistical significance, there was suggestive evidence of a trend for the livestock and poultry water sources being the same as the household’s, child presence for poultry dressing, and child playing where poultry sleep or defecate. These findings suggest that poultry exposure is a risk factor for moderate to severe diarrhea
Summary
Diarrheal disease is the leading cause of pediatric malnutrition and the second leading cause of under five mortality, with over 1.7 billion childhood cases and more than 500,000 under-five deaths each year [1]. To identify animal-related risk factors for moderate-to-severe diarrhea, the GEMS Zoonotic Enteric Diseases (GEMS-ZED) sub-study was conducted from November 2009-February 2011 among subjects enrolled at one of the six GEMS sites in rural western Kenya. This study, whose methods and findings are published PLOS NTD [7] and summarized here, represented an impressive effort to collect detailed data on animals and animal-related exposures in a matched case-control pediatric population, thereby addressing an important scientific question: to what extent does direct and indirect contact with animals and animal excreta bear on pediatric diarrheal disease?. To evaluate the effect of animal-related factors on moderate to severe childhood diarrhea in rural Kenya, where animal contact is common, Conan et al studied 73 matched case-control pairs from 2009-2011, collecting rich exposure data on many dimensions of animal contact. We review the challenges associated with analyzing moderately-sized datasets with a large number of predictors and present two alternative methodological approaches
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.