Abstract

Protein clustering has been widely exploited to facilitate in-depth analysis of protein functions and families. We discuss the design of an incremental protein clustering package that provides comprehensive features for protein function and family analysis. Specifically, the package offers alternative options for carrying out high-quality protein clustering from different aspects. The incremental nature of the clustering algorithm is essential for efficient analysis of those contemporary protein databases whose sizes are growing rapidly. Concerning the quality of clustering results, experimental results from applying the incremental clustering algorithm to protein sequence analysis show that the incremental algorithm is able to identify protein sequence clusters that match protein families more consistently than the single-link algorithm, which is the most widely used hierarchical clustering algorithm for protein sequence analysis. We also address the implementation techniques employed to improve the system performance.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call