Abstract
For a density $f$ on $ {\mathbb {R}}^{d}$ , a high-density cluster is any connected component of $\{x: f(x) \geq \lambda \}$ , for some $\lambda > 0$ . The set of all high-density clusters forms a hierarchy called the cluster tree of $f$ . We present two procedures for estimating the cluster tree given samples from $f$ . The first is a robust variant of the single linkage algorithm for hierarchical clustering. The second is based on the $k$ -nearest neighbor graph of the samples. We give finite-sample convergence rates for these algorithms, which also imply consistency, and we derive lower bounds on the sample complexity of cluster tree estimation. Finally, we study a tree pruning procedure that guarantees, under milder conditions than usual, to remove clusters that are spurious while recovering those that are salient.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.