Abstract

Missing data due to limit of detection and limit of quantification is a common obstacle in epidemiological and biomedical research. We are interested in methodologies that provide unbiased and efficient estimates of these missing data while using popular statistical software. We describe a multiple imputation (MI) procedure for cross-sectional and longitudinal data which examines the sources of variation of hormones levels throughout the menstrual cycle conditional on specific biomarkers. We describe the rational, procedure, advantages and disadvantages of the multiple imputation procedure. We also provide a comparison to commonly used missing data procedures (complete cases analysis and single imputation). We illustrate our approach using the BioCycle data where we are interested in the effects of Vitamin E and Beta-carotene on Progesterone levels. We also evaluate the longitudinal impact of changes in Vitamin E on Progesterone levels over time. Finaly, we demonstrate the advantages of using MI over complete case analysis or naive single replacement in both cross-sectional and longitudinal analysis where measurements below the limit of quantification (LOQ) are unreported. We also illustrate that if available, inclusion of potentially demined unreliable data below the limit of detection (LOD) improves simple estimation substantially.

Highlights

  • As new biomarkers emerge in basic science settings, epidemiologists and statisticians are to evaluate the effectiveness and utility of these new biomarkers

  • It is well known that ignoring the missing data problem and using complete case analysis (CCA) will be valid under Missing Completely at Random (MCAR), but will likely be biased in other situations

  • In both cross-sectional and longitudinal analyses we showed that the use of CCA will lead to different models compared with single and multiple imputation

Read more

Summary

Introduction

As new biomarkers emerge in basic science settings, epidemiologists and statisticians are to evaluate the effectiveness and utility of these new biomarkers. The missingness, the random variable(s) that govern the missing data process is uncorrelated with variables to be used in the analysis In this situation, the reason values are above or below the LOD is random and has nothing to do with the outcome of interest or other measured covariates. We assume that the LOQ level is known, and the reason for missing values is observed which leads to MAR assumption and the use of ignorable models. We apply similar methodology accounting for missingness to a longitudinal scenario and ask how the biomarker (over time) affects the outcome In both scenarios we will allow measurement of biomarkers to have a positive probability for values below the LOQ, and will compare the performance of MI relative to CCA and single imputation.

Data Description and Methods
Progesterone and Vitamin E in Women
Simulations
Findings
Conclusions

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.