Abstract

BackgroundAccurate identification of protein domain boundaries is useful for protein structure determination and prediction. However, predicting protein domain boundaries from a sequence is still very challenging and largely unsolved.ResultsWe developed a new method to integrate the classification power of machine learning with evolutionary signals embedded in protein families in order to improve protein domain boundary prediction. The method first extracts putative domain boundary signals from a multiple sequence alignment between a query sequence and its homologs. The putative sites are then classified and scored by support vector machines in conjunction with input features such as sequence profiles, secondary structures, solvent accessibilities around the sites and their positions. The method was evaluated on a domain benchmark by 10-fold cross-validation and 60% of true domain boundaries can be recalled at a precision of 60%. The trade-off between the precision and recall can be adjusted according to specific needs by using different decision thresholds on the domain boundary scores assigned by the support vector machines.ConclusionsThe good prediction accuracy and the flexibility of selecting domain boundary sites at different precision and recall values make our method a useful tool for protein structure determination and modelling. The method is available at http://sysbio.rnet.missouri.edu/dobo/.

Highlights

  • Accurate identification of protein domain boundaries is useful for protein structure determination and prediction

  • Signal Coverage of Domain Boundaries To ascertain the usefulness of domain boundary signals generated by multiple sequence alignments, we calculated the percentage of domain boundaries which had a signal within 20 residues

  • There were 462 such boundaries and we found that 391 had a domain boundary signal within 20 residues

Read more

Summary

Introduction

Accurate identification of protein domain boundaries is useful for protein structure determination and prediction. Predicting protein domain boundaries from a sequence is still very challenging and largely unsolved. It has been well over thirty years since Wetlaufer formally introduced what he termed structural regions of a protein chain. Such regions were portions of the peptide sequence which assumed a compact structure [1]. The identification and delineation of protein domains has become more prominent as this information eases the determination of protein structure by experimental means and can speed up computational approaches for protein structure prediction [3,4].

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call