Abstract

We consider the problem of clustering a dataset through multiple noisy observations of its members. The goal is to obtain a clustering that is as faithful to the clustering of the original dataset as possible. We propose a centroidal approach whose distortion measure is the sum of r th powers of the distances between the cluster center and the noisy observations. For r=2 , our scheme boils down to the well-known approach of clustering the average of noisy samples. First, we provide a mathematical analysis of our clustering scheme. In particular, we find formulas for the average distortion and the spatial distribution of the cluster centers in the asymptotic regime where the number of centers is large. We then provide an algorithm to numerically optimize the cluster centers in the finite regime. We extend our method to automatically assign weights to noisy observations. Finally, we show that for various practical noise models, with a suitable choice of r , our algorithms can outperform several other existing techniques over various datasets.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.