Abstract

This paper deals with nonparametric estimation of conditional densities in mixture models in the case when additional covariates are available. The proposed approach consists of performing a preliminary clustering algorithm on the additional covariates to guess the mixture component of each observation. Conditional densities of the mixture model are then estimated using kernel density estimates applied separately to each cluster. We investigate the expected $L_{1}$-error of the resulting estimates and derive optimal rates of convergence over classical nonparametric density classes provided the clustering method is accurate. Performances of clustering algorithms are measured by the maximal misclassification error. We obtain upper bounds of this quantity for a single linkage hierarchical clustering algorithm. Lastly, applications of the proposed method to mixture models involving electricity distribution data and simulated data are presented.

Highlights

  • Finite mixture models are widely used to account for population heterogeneities

  • Even if the identification of these connected components is important in our setting, it is not sufficient since our goal is to find an upper bound of the misclassification error (2.8)

  • This paper provides a new framework to estimate conditional densities in mixture models in the presence of covariates

Read more

Summary

Introduction

Finite mixture models are widely used to account for population heterogeneities In many fields such as biology, econometrics and social sciences, experiments are based on the analysis of a variable characterized by a different behavior depending on the group of individuals. A natural way to model heterogeneity for a real random variable Y is to use a mixture model. In this case, the density f of Y can be written as. The number of components M is unknown and needs to be estimated. To this end, some algorithms have been developed to provide consistent estimates of this parameter. In a nonparametric setting, it turns out that identifiability conditions are more difficult to provide. Hall and Zhou (2003) define mild regularity conditions to achieve identifiability in a multivariate nonparametric setting while Kitamura (2004) considers the case where appropriate covariates are available

Objectives
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call