Abstract

Gene3D http://gene3d.biochem.ucl.ac.uk is a database of domain annotations of Ensembl and UniProtKB protein sequences. Domains are predicted using a library of profile HMMs representing 2737 CATH superfamilies. Gene3D has previously featured in the Database issue of NAR and here we report updates to the website and database. The current Gene3D (v14) release has expanded its domain assignments to ∼20 000 cellular genomes and over 43 million unique protein sequences, more than doubling the number of protein sequences since our last publication. Amongst other updates, we have improved our Functional Family annotation method. We have also improved the quality and coverage of our 3D homology modelling pipeline of predicted CATH domains. Additionally, the structural models have been expanded to include an extra model organism (Drosophila melanogaster). We also document a number of additional visualization tools in the Gene3D website.

Highlights

  • Protein structural domains are compact structural modules within proteins and can be grouped into sometimes very large clusters of relatives showing clear evolutionary relatedness, termed homologous superfamilies

  • Work has been carried out to help improve the functional purity of domain assignments by dividing the domain superfamilies into smaller functionally coherent groups termed Functional Families or functional sub-classification of CATH superfamily assignments (FunFams) [3]

  • These FunFams greatly improve the ability to interpret the functions of an experimentally uncharacterized protein based on its domain assignments [3,4]

Read more

Summary

Introduction

Protein structural domains are compact structural modules within proteins and can be grouped into sometimes very large clusters of relatives showing clear evolutionary relatedness, termed homologous superfamilies. These FunFams greatly improve the ability to interpret the functions of an experimentally uncharacterized protein based on its domain assignments [3,4]. The Gene3D resource predicts domain superfamily assignments for tens of millions of protein sequences in UniProKB [5] and Ensembl [6,7] using HMMER3 sequence comparison tools [8] to match against the expertlycurated structural domains in CATH. In addition to domain superfamilies, Gene3D provides the more functionally coherent FunFam assignments.

Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call