Abstract

We consider the problem of estimating the density of a random variable when precise measurements on the variable are not available, but replicated proxies contaminated with measurement error are available for sufficiently many subjects. Under the assumption of additive measurement errors this reduces to a problem of deconvolution of densities. Deconvolution methods often make restrictive and unrealistic assumptions about the density of interest and the distribution of measurement errors, for example, normality and homoscedasticity and thus independence from the variable of interest. This article relaxes these assumptions and introduces novel Bayesian semiparametric methodology based on Dirichlet process mixture models for robust deconvolution of densities in the presence of conditionally heteroscedastic measurement errors. In particular, the models can adapt to asymmetry, heavy tails, and multimodality. In simulation experiments, we show that our methods vastly outperform a recent Bayesian approach based on estimating the densities via mixtures of splines. We apply our methods to data from nutritional epidemiology. Even in the special case when the measurement errors are homoscedastic, our methodology is novel and dominates other methods that have been proposed previously. Additional simulation results, instructions on getting access to the dataset and R programs implementing our methods are included as part of online supplementary materials.

Highlights

  • Many problems of practical importance require estimation of the unknown density of a random variable

  • The results show the mean integrated squared error (MISE) performances of these three models to be very poor for heavy-tailed error distributions and the MISE increased with an increase in sample size due to the presence of an increasing number of outliers

  • Our analysis showed that the measurement error distributions of all dietary components included in the Eating at America’s Table (EATS) study deviate from normality and exhibit strong conditional heteroscedasticity

Read more

Summary

Introduction

Many problems of practical importance require estimation of the unknown density of a random variable. For modeling conditionally heteroscedastic measurement errors, it is assumed that the measurement errors can be factored into ‘scaled errors’ that are independent of the variable of interest and have zero mean and unit variance, and a ‘variance function’ component that explains the conditional heteroscedasticity This multiplicative structural assumption on the measurement errors was implicit in Staudenmayer, et al (2008), where the scaled errors were assumed to come from a standard normal distribution. This gives us the flexibility to model other aspects of the distribution of scaled errors This deconvolution approach, uses flexible Dirichlet process mixture models twice, first to model the density of interest and second to model the density of the scaled errors, freeing them both from restrictive parametric assumptions, while at the same time accommodating conditional heteroscedasticity through the variance function. The supplementary materials provide results of additional simulation experiments and R programs implementing our methods

Background
Modeling the Distribution of X
Modeling the Variance Function
Model-I
Model-II
Model-III
Model Diagnostics
Simulation Experiments
Semiparametric Truth
Nonparametric Truth
Data Description and Model Validation
Results for Daily Intakes of Folate
Summary
Data Transformation and Homoscedasticity
Extensions
Updating the parameters of the distribution of X
Updating the parameters of the distribution of scaled errors
Updating the parameters of the variance function
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.