Abstract

We propose a model-based clustering procedure where each component can take into account cluster-specific mild outliers through a flexible distributional assumption, and a proportion of observations is additionally trimmed. We propose a penalized likelihood approach for estimation and selection of the proportions of mild and gross outliers. A theoretically grounded penalty parameter is then obtained. Simulation studies illustrate the advantages of our procedure over flexible mixtures without trimming, and over trimmed normal mixture models (tclust). We conclude with an original real data example on the identification of the source from illicit drug shipments seized in Italy and Spain. The methodology proposed in this paper has been implemented in R functions which can be downloaded from https://github.com/afarcome/cntclust .

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call