An efficient hybrid clustering algorithm for molecular sequences classification

Wei-Bang Chen

doi:10.1145/1185448.1185543

Abstract

The k-means clustering and hierarchical agglomerative clustering algorithms are two popular methods to partition data into groups. The k-means clustering algorithm heavily favors spherical clusters and does not deal with noise adequately. To overcome these problems, the profile Hidden Markov Models (HMMs) were used to establish a model for each cluster. However, this mixture method still randomly assigns the training data to one of the k clusters, which might cause the problem of generating empty groups. To solve this problem, we proposed a hybrid clustering method by using agglomerative hierarchical clustering algorithm to pre-cluster molecular sequences into k clusters and use the pre-clustered data to generate the HMM profile for each group. Experiments on a molecular sequence dataset demonstrate the effectiveness and the efficiency of the proposed clustering algorithm.

Full Text