Abstract

BackgroundProtein domains have long been an ill-defined concept in biology. They are generally described as autonomous folding units with evolutionary and functional independence. Both structure-based and sequence-based domain definitions have been widely used. But whether these types of models alone can capture all essential features of domains is still an open question.MethodsHere we provide insight on domain definitions through comparative mapping of two domain classification databases, one sequence-based (Pfam) and the other structure-based (SCOP). A mapping score is defined to indicate the significance of the mapping, and the properties of the mapping matrices are studied.ResultsThe mapping results show a general agreement between the two databases, as well as many interesting areas of disagreement. In the cases of disagreement, the functional and evolutionary characteristics of the domains are examined to determine which domain definition is biologically more informative.

Highlights

  • Protein domains have long been an ill-defined concept in biology

  • Materials All Protein Data Bank (PDB) protein sequences, based on PDB SEQRES records, with less than 95% identity to each other were downloaded from the ASTRAL Compendium [18,19]

  • Domain mapping A total of 2081 Pfam families and 2512 SCOP domain families are defined in the set of 8259 PDB protein chains

Read more

Summary

Introduction

Protein domains have long been an ill-defined concept in biology They are generally described as autonomous folding units with evolutionary and functional independence. Both structure-based and sequence-based domain definitions have been widely used. Protein domains are generally considered as protein fragments of common structures which may independently fold [4] or have their own functions [5]. They have been treated as evolutionary units [6]. Classifying proteins based on their constituent domains is one of the most effective and efficient approaches to organize protein data both by structures and by evolutionary relationships. The challenge lies in the ambiguity of domain definitions, as well as the lack of useful structural information about most proteins

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.