Abstract

The k-means clustering and hierarchical agglomerative clustering algorithms are two popular methods to partition data into groups. The k-means clustering algorithm heavily favors spherical clusters and does not deal with noise adequately. To overcome these problems, the profile Hidden Markov Models (HMMs) were used to establish a model for each cluster. However, this mixture method still randomly assigns the training data to one of the k clusters, which might cause the problem of generating empty groups. To solve this problem, we proposed a hybrid clustering method by using agglomerative hierarchical clustering algorithm to pre-cluster molecular sequences into k clusters and use the pre-clustered data to generate the HMM profile for each group. Experiments on a molecular sequence dataset demonstrate the effectiveness and the efficiency of the proposed clustering algorithm.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call