Abstract

Author SummaryMore than 40% of known proteins lack any annotation within public databases and are usually referred to as hypothetical proteins despite most of them being real and many being evolutionarily conserved and thus expected to play important biological roles. Determination of the three-dimensional structures of representatives of more than 240 families of protein domains of unknown function by the Protein Structure Initiative has provided a unique sample of regions of the protein universe that, until this systematic effort, were completely uncharacterized. Analysis of these structures reveals that most of the 240 families can be considered as remote homologs of already known protein families. Such distant evolutionary links can sometimes be predicted by current state-of-the-art sequence comparison tools, but structural analysis has led to the first hypotheses about biological functions for many of these uncharacterized proteins, and serves as a starting point for experimental studies. The rapid pace of discovery of such relationships appears to suggest that the protein universe is made up of a relatively small and stable number of ‘extended neighborhoods’ that bring together distantly related protein families. Thus, the vast uncharacterized part of protein universe, called by some “the dark matter of protein space”, may consist mainly of highly divergent homologs. Continued structural characterization of these previously under-investigated regions of the protein universe should further help unravel the patterns and rules that led to such divergence in the evolution of protein structure and function.

Highlights

  • The sequences of several millions of proteins are currently known and this number is growing ever more rapidly as a result of the relentless efficiency of genomic and metagenomic sequencing projects

  • Determination of the three-dimensional structures of representatives of more than 240 families of protein domains of unknown function by the Protein Structure Initiative has provided a unique sample of regions of the protein universe that, until this systematic effort, were completely uncharacterized

  • Analysis of these structures reveals that most of the 240 families can be considered as remote homologs of already known protein families. Such distant evolutionary links can sometimes be predicted by current state-of-the-art sequence comparison tools, but structural analysis has led to the first hypotheses about biological functions for many of these uncharacterized proteins, and serves as a starting point for experimental studies

Read more

Summary

Introduction

The sequences of several millions of proteins are currently known and this number is growing ever more rapidly as a result of the relentless efficiency of genomic and metagenomic sequencing projects. ‘‘Hypothetical proteins’’ are not merely artifacts, and many have been validated as gene products in function-based, genome-scale surveys, such as essentiality analysis [1,2], disease association studies [3,4,5], genomewide DNA expression arrays [6,7,8], cDNA and proteomics-based environmental surveys [9,10,11,12]. They are bona fide proteins that have not yet been the focus of any detailed study. The NIH Protein Structure Initiative (PSI; http://www.nigms.nih.gov/Initiatives/PSI/) has made a concerted and systematic effort to explore these uncharted regions of the protein universe as a means to uncover new insights into the evolution and diversity of protein structure and function

Methods
Results
Discussion
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.