The Use of Multiple Imputation for Data Subject to Limits of Detection.

Ofer Harel,Neil Perkins,Enrique F Schisterman

doi:10.4038/sljastats.v5i4.7792

Abstract

Missing data due to limit of detection and limit of quantification is a common obstacle in epidemiological and biomedical research. We are interested in methodologies that provide unbiased and efficient estimates of these missing data while using popular statistical software. We describe a multiple imputation (MI) procedure for cross-sectional and longitudinal data which examines the sources of variation of hormones levels throughout the menstrual cycle conditional on specific biomarkers. We describe the rational, procedure, advantages and disadvantages of the multiple imputation procedure. We also provide a comparison to commonly used missing data procedures (complete cases analysis and single imputation). We illustrate our approach using the BioCycle data where we are interested in the effects of Vitamin E and Beta-carotene on Progesterone levels. We also evaluate the longitudinal impact of changes in Vitamin E on Progesterone levels over time. Finaly, we demonstrate the advantages of using MI over complete case analysis or naive single replacement in both cross-sectional and longitudinal analysis where measurements below the limit of quantification (LOQ) are unreported. We also illustrate that if available, inclusion of potentially demined unreliable data below the limit of detection (LOD) improves simple estimation substantially.

Highlights

As new biomarkers emerge in basic science settings, epidemiologists and statisticians are to evaluate the effectiveness and utility of these new biomarkers
It is well known that ignoring the missing data problem and using complete case analysis (CCA) will be valid under Missing Completely at Random (MCAR), but will likely be biased in other situations
In both cross-sectional and longitudinal analyses we showed that the use of CCA will lead to different models compared with single and multiple imputation

Summary

Introduction

As new biomarkers emerge in basic science settings, epidemiologists and statisticians are to evaluate the effectiveness and utility of these new biomarkers. The missingness, the random variable(s) that govern the missing data process is uncorrelated with variables to be used in the analysis In this situation, the reason values are above or below the LOD is random and has nothing to do with the outcome of interest or other measured covariates. We assume that the LOQ level is known, and the reason for missing values is observed which leads to MAR assumption and the use of ignorable models. We apply similar methodology accounting for missingness to a longitudinal scenario and ask how the biomarker (over time) affects the outcome In both scenarios we will allow measurement of biomarkers to have a positive probability for values below the LOQ, and will compare the performance of MI relative to CCA and single imputation.

Data Description and Methods

Progesterone and Vitamin E in Women

Simulations

Findings

Conclusions

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Sri Lankan Journal of Applied Statistics	Publication Date: Dec 15, 2014
Citations: 33	License type: cc-by

R Discovery Prime

R Discovery Prime

The Use of Multiple Imputation for Data Subject to Limits of Detection.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Sri Lankan Journal of Applied Statistics

Lead the way for us

Similar Papers

What is missing from my missing data plan?
Sharon D Yeatts ... Renée H Martin
Stroke | VOL. 46
Sharon D Yeatts, et. al.Sharon D Yeatts ... Renée H Martin
07 May 2015
Stroke | VOL. 46

Missing data in substance abuse treatment research: current methods and modern approaches.
Sterling Mcpherson ... Donelle Howell
Experimental and Clinical Psychopharmacology | VOL. 20
Sterling Mcpherson, et. al.Sterling Mcpherson ... Donelle Howell
01 Jun 2012
Experimental and Clinical Psychopharmacology | VOL. 20

Methods for Handling Missing Data
John W Graham ... Allison E Shevock
-
John W Graham, et. al.John W Graham ... Allison E Shevock
26 Sep 2012
26 Sep 2012

Multiple imputation for non-response when estimating HIV prevalence using survey data.
Amos Chinomona ... Henry Mwambi
BMC Public Health | VOL. 15
Amos Chinomona, et. al.Amos Chinomona ... Henry Mwambi
16 Oct 2015
BMC Public Health | VOL. 15

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

The Use of Multiple Imputation for Data Subject to Limits of Detection.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Sri Lankan Journal of Applied Statistics