Abstract

A variety of protein domain predictors were developed to predict protein domain boundaries in recent years, but most of them cannot predict discontinuous domains. Considering nearly 40% of multidomain proteins contain one or more discontinuous domains, we have developed DomEx to enable domain boundary predictors to detect discontinuous domains by assembling the continuous domain segments. Discontinuous domains are predicted by matching the sequence profile of concatenated continuous domain segments with the profiles from a single-domain library derived from SCOP and CATH, and Pfam. Then the matches are filtered by similarity to library templates, a symmetric index score and a profile-profile alignment score. DomEx recalled 32.3% discontinuous domains with 86.5% precision when tested on 97 non-homologous protein chains containing 58 continuous and 99 discontinuous domains, in which the predicted domain segments are within ±20 residues of the boundary definitions in CATH 3.5. Compared with our recently developed predictor, ThreaDom, which is the state-of-the-art tool to detect discontinuous-domains, DomEx recalled 26.7% discontinuous domains with 72.7% precision in a benchmark with 29 discontinuous-domain chains, where ThreaDom failed to predict any discontinuous domains. Furthermore, combined with ThreaDom, the method ranked number one among 10 predictors. The source code and datasets are available at https://github.com/xuezhidong/DomEx.

Highlights

  • Proteins consist of one or several stable, compact, and autonomously folding substructures, which are referred to as domains

  • The average Matthews Correlation Coefficient (MCC), recall and precision for the training dataset with different TTS and TSI are shown in Fig 5A–5C, respectively

  • Two test benchmarks showed that DomEx worked with the boundary predictors, and was complementary to the discontinuous-domain detection method in ThreaDom

Read more

Summary

Introduction

Proteins consist of one or several stable, compact, and autonomously folding substructures, which are referred to as domains. The identification of protein domains plays an important role in determining protein structures by experimental methods including Nuclear Magnetic Resonance (NMR) and X-ray crystallography[1,2]. It is a preliminary step in computational methods of protein structure prediction [3,4,5]. Detailed knowledge of domains is essential to advancing our understanding of protein function and evolution [6,7].

Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.