Evidence Accumulation from Some Clustering Algorithms to Improve Gene Expression Data Classification

Ranjita Das,Sriparna Saha

doi:10.1109/iscmi.2016.54

Abstract

The idea of ensemble based clustering is to combine the data partitions produced by multiple clustering algorithms. Here we have considered several recently developed clustering algorithms like point symmetry distance based genetic clustering technique (GAPS), symmetry based differential evolution and particle swarm optimization based clustering algorithms, popular K-means and fuzzy C-means clustering algorithms as the basic approaches for the generation of multiple clustering solutions. Here those basic algorithms perform the decomposition of initial N X d-dimensional data into k compact clusters. The objective of the use of ensemble clustering to get a single combined solution from the set of different individual partitionings is to increase the accuracy of final partitioning. Here the evidence on pattern association is accumulated by a Link based ensemble method called CTS. This produces a mapping of the partitioning into a N X N matrix that represents new similarity measure between patterns. The final data partition is obtained by applying the single-linkage clustering algorithm using this new similarity matrix. For experimental purpose some publicly available gene expression datasets have been used. Moreover to validate the clustering solutions obtained from the link based cluster ensemble method as well as from the individual base clustering algorithms, some internal cluster validity indices, DB-index and DUNN-index have been used.

Full Text