Subquadratic Approximation Algorithms for Clustering Problems in High Dimensional Spaces

Allan Borodin,Yuval Rabani,Rafail Ostrovsky

doi:10.1023/b:mach.0000033118.09057.80

Abstract

One of the central problems in information retrieval, data mining, computational biology, statistical analysis, computer vision, geographic analysis, pattern recognition, distributed protocols is the question of classification of data according to some clustering rule. Often the data is noisy and even approximate classification is of extreme importance. The difficulty of such classification stems from the fact that usually the data has many incomparable attributes, and often results in the question of clustering problems in high dimensional spaces. Since they require measuring distance between every pair of data points, standard algorithms for computing the exact clustering solutions use quadratic or “nearly quadratic” running time; i.e., O(dn 2−α(d)) time where n is the number of data points, d is the dimension of the space and α(d) approaches 0 as d grows. In this paper, we show (for three fairly natural clustering rules) that computing an approximate solution can be done much more efficiently. More specifically, for agglomerative clustering (used, for example, in the Alta Vista™ search engine), for the clustering defined by sparse partitions, and for a clustering based on minimum spanning trees we derive randomized (1 + ∈) approximation algorithms with running times Õ(d 2 n 2−γ) where γ > 0 depends only on the approximation parameter ∈ and is independent of the dimension d.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Subquadratic Approximation Algorithms for Clustering Problems in High Dimensional Spaces

Abstract

Talk to us

Similar Papers

More From: Machine Learning

Lead the way for us

Journal: Machine Learning	Publication Date: Jul 1, 2004
Citations: 21

Similar Papers

Subquadratic approximation algorithms for clustering problems in high dimensional spaces
Allan Borodin ... Yuval Rabani
-
Allan Borodin, et. al.Allan Borodin ... Yuval Rabani
01 May 1999
01 May 1999

QLEC
Ke Li ... Guihai Chen
-
Ke Li, et. al.Ke Li ... Guihai Chen
05 Aug 2019
05 Aug 2019

A Novel PCA-Based Bayes Classifier and Face Analysis
Zhong Jin ... Franck Davoine
-
Zhong Jin, et. al.Zhong Jin ... Franck Davoine
01 Jan 2004
01 Jan 2004

On Optimizing Nearest Neighbor Queries in High-Dimensional Data Spaces
Stefan Berchtold ... Florian Krebs
-
Stefan Berchtold, et. al.Stefan Berchtold ... Florian Krebs
01 Jan 2001
01 Jan 2001

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Subquadratic Approximation Algorithms for Clustering Problems in High Dimensional Spaces

Abstract

Talk to us

Similar Papers

More From: Machine Learning