Abstract

Clustering algorithms have been used to divide genes into groups ac- cording to the degree of their expression similarity. Such a grouping may suggest that the respective genes are correlated and/or co-regulated, and subsequently in- dicates that the genes could possibly share a common biological role. In this pa- per, four clustering algorithms are investigated: k-means, cut-clustering, spectral and expectation-maximization. The algorithms are benchmarked against each other. The performance of the four clustering algorithms is studied on time series expres- sion data using Dynamic Time Warping distance in order to measure similarity be- tween gene expression profiles. Four different cluster validation measures are used to evaluate the clustering algorithms: Connectivity and Silhouette Index for esti- mating the quality of clusters, Jaccard Inde xf or evaluating the stability of ac luster method and Rand Index for assessing the accuracy. The obtained results are ana- lyzed by Friedman's test and the Nemenyi post-hoc test. K-means is demonstrated to be significantly better than the spectral clustering algorithm under the Silhouette and Rand validation indices. Keywords. gene expression data, graph-based clustering algorithm, minimum cut clustering, partitioning algorithm, dynamic time warping

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.