On Application of a Probabilistic K-Nearest Neighbors Model for Cluster Validation Problem

Zeev Volkovich,Zeev Barzily,Renata Avros,Dvora Toledano-Kitai

doi:10.1080/03610926.2011.562786

Abstract

K-Nearest Neighbors is a widely used technique for classifying and clustering data. In the current article, we address the cluster stability problem based upon probabilistic characteristics of this approach. We estimate the stability of partitions obtained from clustering pairs of samples. Partitions are presumed to be consistent if their clusters are stable. Clusters validity is quantified through the amount of K-Nearest Neighbors belonging to the point's sample. The null-hypothesis, of the well-mixed samples within the clusters, suggests Binomial Distribution of this quantity with K trials and the success probability 0.5. A cluster is represented by a summarizing index, of the p-values calculated over all cluster objects, under the null hypothesis for the alternative, and the partition quality is evaluated via the worst partition cluster. The true number of clusters is attained by the empirical index distribution having maximal suitable asymmetry. The proposed methodology offers to produce the index distributions sequentially and to assess their asymmetry. Numerical experiments exhibit a good capability of the methodology to expose the true number of clusters.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

On Application of a Probabilistic K-Nearest Neighbors Model for Cluster Validation Problem

Abstract

Talk to us

Similar Papers

More From: Communications in Statistics - Theory and Methods

Lead the way for us

Journal: Communications in Statistics - Theory and Methods	Publication Date: Aug 15, 2011
Citations: 18

Similar Papers

CLUSTER STABILITY ESTIMATION BASED ON A MINIMAL SPANNING TREES APPROACH
Zeev (Vladimir) Volkovich ... Nader Barsoum
-
Zeev (Vladimir) Volkovich, et. al.Zeev (Vladimir) Volkovich ... Nader Barsoum
01 Jan 2009
01 Jan 2009

An application of the minimal spanning tree approach to the cluster stability problem
Z Volkovich ... R Avros
Central European Journal of Operations Research | VOL. 20
Z Volkovich, et. al.Z Volkovich ... R Avros
15 Jul 2010
Central European Journal of Operations Research | VOL. 20

Validation of K-means Clustering : Why is Bootstrapping Better Than Subsampling?
...
-
, et. al. ...
01 Jan 2017
01 Jan 2017

A Novel Approach for Automatic Number of Clusters Detection in Microarray Data Based on Consensus Clustering
Nguyen Xuan Vinh ... Julien Epps
-
Nguyen Xuan Vinh, et. al.Nguyen Xuan Vinh ... Julien Epps
01 Jun 2009
01 Jun 2009

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

On Application of a Probabilistic K-Nearest Neighbors Model for Cluster Validation Problem

Abstract

Talk to us

Similar Papers

More From: Communications in Statistics - Theory and Methods