Abstract

In this paper, we report a framework for biological sequence clustering and classification. The proposed framework adopts a two-phase hybrid method for clustering, and then uses the dynamic programming technique for classification. The two-phase hybrid method combines the strengths of the hierarchical and the partition clustering. Phase I of the hybrid method uses the hierarchical agglomerative clustering to pre-cluster the aligned sequences. Phase II performs the partition clustering which initiates its partition based on the result from Phase I and uses profile Hidden Markov Models (HMMs) to represent clusters. The profile HMMs are then stored in the database for unknown sequences classification, which is done by finding the best alignment of a sequence to each existing profile HMM. However, the profile HMMs and the sequence might be different in length. The dynamic programming technique proposed in our framework can efficiently find the optimal alignment for sequences of variable lengths, which enables the evaluation of the cluster membership for any unknown sequence against fixed-length HMMs. Our experiments demonstrate the effectiveness and the efficiency of the proposed framework for biological sequence clustering and classification.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.