Abstract

Clustering is the problem of identifying the distribution of patterns and intrinsic correlations in large data sets by partitioning the data points into similarity classes. The clustering problem has been widely studied in machine learning, databases, and statistics. This paper studies the problem of clustering high dimensional data. The paper proposes an algorithm called the CoFD algorithm, which is a non-distance based clustering algorithm for high dimensional spaces. Based on the Maximum Likelihood Principle, CoFD attempts to optimize its parameter settings to maximize the likelihood between data points and the model generated by the parameters. The distributed versions of the problem, called the D-CoFD algorithms, are also proposed. Experimental results on both synthetic and real data sets show the efficiency and effectiveness of CoFD and D-CoFD algorithms.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call