Abstract

Ensemble clustering is a promising approach that combines the results of multiple clustering algorithms to obtain a consensus partition by merging different partitions based upon well-defined rules. In this study, we use an ensemble clustering approach for merging the results of five different clustering algorithms that are sometimes used in bioinformatics applications. The ensemble clustering result is tested on microarray data sets and compared with the results of the individual algorithms. An external cluster validation index, adjusted rand index (C-rand), and two internal cluster validation indices; silhouette, and modularity are used for comparison purposes.

Highlights

  • High throughput data technologies allow the production and analysis of biological data to address critical questions related to disease prediction and gene function, among others

  • We investigated an application of the ensemble clustering approach described in [2] using five different clustering algorithms that have not been reported in an ensemble framework before

  • We evaluated the relative performance of the individual algorithms and the ensemble approach on three different biological data sets using two internal and one external (C-rand) validation index

Read more

Summary

INTRODUCTION

We use an ensemble clustering approach as described in [2] for three different biological data sets. The base clustering algorithms used for the ensemble approach used here are hierarchical clustering (HC), K-means, dynamic tree cut (DTC), fuzzy C-means and a community structure finding algorithm (CSF). All of these algorithms except for fuzzy Cmeans were used and detailed in a previous study [5] to compare the performance of the individual algorithms with one another. The remainder of the paper is organized as follows: section 2 gives background on ensemble clustering, section 3 concerns the application of the ensemble clustering on the three previously mentioned biological data sets and presents results, and section 4 concludes and indicates directions for possible future studies

BACKGROUND
APPLICATION TO BIOLOGICAL DATA
CONCLUSION
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call