Abstract

BackgroundThe reliable dissection of large proteins into structural domains represents an important issue for structural genomics/proteomics projects. To provide a practical approach to this issue, we tested the ability of neural network to identify domain linkers from the SWISSPROT database (101602 sequences).ResultsOur search detected 3009 putative domain linkers adjacent to or overlapping with domains, as defined by sequence similarity to either Protein Data Bank (PDB) or Conserved Domain Database (CDD) sequences. Among these putative linkers, 75% were "correctly" located within 20 residues of a domain terminus, and the remaining 25% were found in the middle of a domain, and probably represented failed predictions. Moreover, our neural network predicted 5124 putative domain linkers in structurally un-annotated regions without sequence similarity to PDB or CDD sequences, which suggest to the possible existence of novel structural domains. As a comparison, we performed the same analysis by identifying low-complexity regions (LCR), which are known to encode unstructured polypeptide segments, and observed that the fraction of LCRs that correlate with domain termini is similar to that of domain linkers. However, domain linkers and LCRs appeared to identify different types of domain boundary regions, as only 32% of the putative domain linkers overlapped with LCRs.ConclusionOverall, our study indicates that the two methods detect independent and complementary regions, and that the combination of these methods can substantially improve the sensitivity of the domain boundary prediction. This finding should enable the identification of novel structural domains, yielding new targets for large scale protein analyses.

Highlights

  • The reliable dissection of large proteins into structural domains represents an important issue for structural genomics/proteomics projects

  • Detection of putative domain linkers by the neural network In many applications, including ours, it is critical to reduce the number of false positives because of their experimental costs, while false negatives are not as detrimental

  • Our study strongly suggests that sequence characteristics alone, as detected by either our neural network or SEG, can identify domain boundaries in protein sequences even without sequence similarity to existing domain databases

Read more

Summary

Introduction

The reliable dissection of large proteins into structural domains represents an important issue for structural genomics/proteomics projects. Structural genomics/proteomics projects seek to establish high-throughput techniques by promoting routine protein structure determination either by X-ray crystallography or NMR spectroscopy [1,2,3,4,5,6,7]. PDB (Protein Data Bank) is about 230 residues. This situation reflects the difficulty of determining large protein structures, and that of expressing and purifying them. Dissecting large proteins into their structural domains can provide several candidates for swift structural analysis by either X-ray crystallography or NMR spectroscopy

Objectives
Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.