Abstract

Density-based clustering relies on the idea of linking groups to some specific features of the probability distribution underlying the data. The reference to a true, yet unknown, population structure allows framing the clustering problem in a standard inferential setting, where the concept of ideal population clustering is defined as the partition induced by the true density function. The nonparametric formulation of this approach, known as modal clustering, draws a correspondence between the groups and the domains of attraction of the density modes. Operationally, a nonparametric density estimate is required and a proper selection of the amount of smoothing, governing the shape of the density and hence possibly the modal structure, is crucial to identify the final partition. In this work, we address the issue of density estimation for modal clustering from an asymptotic perspective. A natural and easy to interpret metric to measure the distance between density-based partitions is discussed, its asymptotic approximation explored, and employed to study the problem of bandwidth selection for nonparametric modal clustering.

Highlights

  • Clustering is commonly referred to as the task of finding groups in a set of data points

  • We propose new data-based bandwidth selectors designed for modal clustering purposes

  • The modal clustering methodology provides a framework to perform cluster analysis with a clear and explicit population goal. It allows clusters of arbitrary shape and size, which can be captured by means of a nonparametric density estimator

Read more

Summary

Introduction

Clustering is commonly referred to as the task of finding groups in a set of data points (see [24], [16] or [21]). The densitybased approach attempts to circumscribe this issue by framing the problem into a statistically rigorous setting where the observed data are assumed to be realizations of a random variable, and the clusters are defined with respect to some characteristic of its underlying probability distribution. Our main result provides an asymptotic approximation for the considered metric, which allows to introduce new automatic bandwidth selection procedures designed for nonparametric modal clustering. The accuracy of this approximation and the performance of the new methods in practice, with respect to the proposed error criterion, is extensively studied via simulations, and compared with some plausible competitors.

Background
Asymptotic bandwidth selection for modal clustering
Some remarks
Numerical results
Multidimensional generalization
Conclusions
A Proofs
B Parameter settings
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.