Abstract

The zinc-dependent metalloproteases with His-Glu-x-x-His (HExxH) active site motif, zincins, are a broad group of proteins involved in many metabolic and regulatory functions, and found in all forms of life. Human genome contains more than 100 genes encoding proteins with known zincin-like domains. A survey of all proteins containing the HExxH motif shows that approximately 52% of HExxH occurrences fall within known protein structural domains (as defined in the Pfam database). Domain families with majority of members possessing a conserved HExxH motif include, not surprisingly, many known and putative metalloproteases. Furthermore, several HExxH-containing protein domains thus identified can be confidently predicted to be putative peptidases of zincin fold. Thus, we predict zincin-like fold for eight uncharacterised Pfam families. Besides the domains with the HExxH motif strictly conserved, and those with sporadic occurrences, intermediate families are identified that contain some members with a conserved HExxH motif, but also many homologues with substitutions at the conserved positions. Such substitutions can be evolutionarily conserved and non-random, yet functional roles of these inactive zincins are not known. The CLCAs are a novel zincin-like protease family with many cases of substituted active sites. We show that this allegedly metazoan family has a number of bacterial and archaeal members. An extremely patchy phylogenetic distribution of CLCAs in prokaryotes and their conserved protein domain composition strongly suggests an evolutionary scenario of horizontal gene transfer (HGT) from multicellular eukaryotes to bacteria, providing an example of eukaryote-derived xenologues in bacterial genomes. Additionally, in a protein family identified here as closely homologous to CLCA, the CLCA_X (CLCA-like) family, a number of proteins is found in phages and plasmids, supporting the HGT scenario.

Highlights

  • The protein sequence space, recently becoming sampled more and more densely thanks to genomic and metagenomic sequencing projects, has undoubtedly ‘granular’ features, and can be classified using various algorithms and classification systems [1,2]

  • We checked whether the occurrences of this motif were within the known Pfam protein domains (Pfam database version 24.0), or outside those

  • After removal of redundancy in the hit sequence set at 90% sequence identity, the occurrence of the HExxH motif within Pfam domains was 47794 which makes up 52% of the occurrences, while the occurrence outside Pfam domains was 43395

Read more

Summary

Introduction

The protein sequence space, recently becoming sampled more and more densely thanks to genomic and metagenomic sequencing projects, has undoubtedly ‘granular’ features, and can be classified using various algorithms and classification systems [1,2]. It has features of continuity, with very distant sequence similarities discovered between hitherto unrelated protein families, and local structural similarities found between members of different folds [3,4]. Among the generic class of proteases, distinct clans have been identified using the catalytic mechanism and three-dimensional fold as the classifier [18,19]. The proteins containing the zincin-like domains often feature complex domain composition reflecting their biological functions [23]

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.