Abstract

MotivationThe precise prediction of protein domains, which are the structural, functional and evolutionary units of proteins, has been a research focus in recent years. Although many methods have been presented for predicting protein domains and boundaries, the accuracy of predictions could be improved.ResultsIn this study we present a novel approach, DomHR, which is an accurate predictor of protein domain boundaries based on a creative hinge region strategy. A hinge region was defined as a segment of amino acids that covers part of a domain region and a boundary region. We developed a strategy to construct profiles of domain-hinge-boundary (DHB) features generated by sequence-domain/hinge/boundary alignment against a database of known domain structures. The DHB features had three elements: normalized domain, hinge, and boundary probabilities. The DHB features were used as input to identify domain boundaries in a sequence. DomHR used a nonredundant dataset as the training set, the DHB and predicted shape string as features, and a conditional random field as the classification algorithm. In predicted hinge regions, a residue was determined to be a domain or a boundary according to a decision threshold. After decision thresholds were optimized, DomHR was evaluated by cross-validation, large-scale prediction, independent test and CASP (Critical Assessment of Techniques for Protein Structure Prediction) tests. All results confirmed that DomHR outperformed other well-established, publicly available domain boundary predictors for prediction accuracy.AvailabilityThe DomHR is available at http://cal.tongji.edu.cn/domain/.

Highlights

  • Protein domains are the structural, functional and evolutionary units of proteins

  • S628 was extracted from Cheng’s package [5], in which the sequence identity of each pair of protein chains was less than 25%, the domain number of the proteins agreed in both SCOP (v 1.75) and CATH (v3.3.0), and any protein whose length was less than 90 residues was removed

  • In this work, we proposed a hybrid method, DomHR, to accurately predict domain boundaries in proteins based on a creative hinge region strategy

Read more

Summary

Introduction

Protein domains are the structural, functional and evolutionary units of proteins. Most domains are single continuous polypeptide segments, while a few consist of several discontinuous segments. Small proteins often consist of only a single domain, while many large proteins comprise two or multiple structural domains [2]. The exact identification of protein domains and their boundaries is important for protein classification and the study of protein structure, function, and evolution, and for drug discovery, disease treatments and genetic engineering. The speed of manual identification and annotation of proteins lags behind the rate of sequence creation. To fill this gap, a computational approach to domain identification is highly desirable

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.