Abstract
To the Editor: Epidemiologic studies sometimes have large proportions of missing data, especially in environmental studies where personal or environmental measurements of pollutants are difficult and may be performed only in subsamples.1 Omitting missing values and single imputation are the usual strategies for estimating health effects. Other approaches to handle missing values include multiple imputation2,3 and hierarchical Bayesian approaches, which jointly simulate the distributions of variables with missing data and unknown parameters in a regression model.4 We compare performances of these approaches using a case study and a simulation study (eAppendix, https://links.lww.com/EDE/A671). The case study addresses the association of formaldehyde exposure with lower respiratory infections during the first year of life (infection prevalence 46%) in the PARIS (Pollution and Asthma Risk: an Infant Study) cohort.5 Formaldehyde measurements6 were performed in a 5% random sample of infants’ dwellings (142 of 2551 infants). This sampling method is the best way to check the missing completely at random assumption needed for imputation techniques.7 The simulation study created data with characteristics near those of real data but controlling the true value of odds ratio (OR 1.0, 1.2, and 1.4) and varying the proportion of missing values (0%, 50%, and 95%). For each scenario, 100 datasets were generated. Measured formaldehyde levels were only weakly associated with infection after adjustment for potential confounders (OR = 1.11; 95% confidence interval = 0.48, +∞). The single-imputation technique produced higher estimates (1.91; 1.53, +∞). The hierarchical Bayesian approach produced an OR (1.27; 1.10, +∞) very similar to that obtained with multiple imputation (1.28; 0.91, +∞), although with much better precision. The Table provides results from the simulation study when 95% of the data are missing. The relative bias and proportion of statistically significant associations (a measure of precision) are provided for each OR. There was little bias after omitting missing values or using the Bayesian approach. Imputation techniques led to a strong bias with single imputation, especially with the higher ORs. When OR = 1.0, only the Bayesian approach produced a proportion of significant associations near 5% (type I error), whereas the single imputation strongly overestimated the proportion of significant associations. If we exclude single imputation, which generated the highest bias and was most susceptible to type I error, the highest proportions of statistically significant associations (the most precise estimates) were provided by Bayesian models, with all other approaches having relatively low power.TABLE: Simulation Study: Relative Bias (RB) and Proportion of “Significant” Associationa (PS) with 95% Confidence Interval for Different Odds Ratio (OR: 1.0, 1.2, and 1.4) and 95% of Missing Values, on 100 ReplicatesAmong the various methods for dealing with missing data, none is superior in all circumstances. When a large amount of data are missing, the omission of missing values and single imputation do not present the best performances, as has previously been described.2 Unlike single imputation, multiple imputation takes into account the uncertainty of estimates and consequently should be preferred. Still, true associations can be hidden by overcoverage; this conservatism has been previously noted.8 Bayesian approaches seem to be the most efficient (with low bias and high precision). In formulating guidelines on how to handle large amounts of missing, further studies might consider varying the outcome prevalence and the mechanisms of missingness. ACKNOWLEDGMENTS We thank the families for their participation and the administrative staff for their involvement in the PARIS study. Célina Roda Ioannis Nicolis Univ Paris Descartes Sorbonne Paris Cité EA 4064 Santé Publique et Environnement Paris, France Isabelle Momas Univ Paris Descartes Sorbonne Paris Cité EA 4064 Santé Publique et Environnement Paris, France Mairie de Paris Direction de l’Action Sociale de l’Enfance et de la Santé Cellule Cohorte, Paris, France [email protected] Chantal Guihenneuc-Jouyaux Univ Paris Descartes Sorbonne Paris Cité EA 4064 Santé Publique et Environnement Paris, France
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.