Abstract

Microarray is already well established techniques to understand various cellular functions by profiling transcriptomics data. To capture the overall feature of high dimensional variable datasets in microarray data, various analytical and statistical approaches are already developed. One of the most widely used Agglomerative Hierarchical Clustering (AHC) methods is the cluster analysis of gene expression data; however, little work has been done to compare the performance of clustering methods on gene expression data, where some authors used three or four AHC methods and some others used at most five AHC methods. All of the authors concretely suggested complete linkage method to further researchers to determine the best method for clustering their gene expression data. This paper compared the performance of seven AHC methods for clustering gene expression data with respect to five major proximity measures. We used corrected Rand (cR) Index to compare the performance of each clustering method. To illustrate the results, we found that the clustering method Ward exhibited the best performance among all of the AHC methods as well as the proximity measure Cosine performed better in comparison to all the other measures in both type of Affymetrix and cDNA datasets.

Highlights

  • Cluster analysis programs are routinely run as a first step of data summary and grouping genes in a microarray data analysis

  • Cluster analysis techniques of gene expression microarray data is of increasing interest in the field of functional genomics

  • This paper shows a comparative study of seven Agglomerative Hierarchical Clustering (AHC) methods regarding to five proximity measures applied in a large scale datasets

Read more

Summary

Introduction

Cluster analysis programs are routinely run as a first step of data summary and grouping genes in a microarray data analysis. It is essential to know which clustering method is best for which type of microarray gene (cancer) data. DNA microarray technology has made it possible to simultaneously monitor the expression levels of thousands of genes during important biological processes and across collections of related samples. There are a small number of analyses in literature for evaluating the performance of different clustering method applied to gene expression data. Three AHC methods (Single Linkage, Complete Linkage and Average Linkage) were used to identify the clustering performance in gene expression data [7, 8, 16, 25]. Four AHC methods (Single Linkage, Complete Linkage, Average Linkage and Centroid Linkage) were practiced to evaluate the clustering performance in their

Methods
Distance and Similarity Measures for Gene Expression Data
Checking Validity of Clusters
Experiments and Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call