Abstract

The latest version of the CATH-Gene3D protein structure classification database has recently been released (version 4.1, http://www.cathdb.info). The resource comprises over 300 000 domain structures and over 53 million protein domains classified into 2737 homologous superfamilies, doubling the number of predicted protein domains in the previous version. The daily-updated CATH-B, which contains our very latest domain assignment data, provides putative classifications for over 100 000 additional protein domains. This article describes developments to the CATH-Gene3D resource over the last two years since the publication in 2015, including: significant increases to our structural and sequence coverage; expansion of the functional families in CATH; building a support vector machine (SVM) to automatically assign domains to superfamilies; improved search facilities to return alignments of query sequences against multiple sequence alignments; the redesign of the web pages and download site.

Highlights

  • CATH-Gene3D, established in the mid 1990s, is a publiclyaccessible, online resource providing a protein domain structure classification [1]

  • We provide access to the very latest putative CATH annotations in CATH-B

  • For all CATH domain superfamilies having two or more functional families (FunFams), superfamily networks have been constructed in which FunFams are represented by nodes and the edge distances correspond to the sequence similarity between the FunFam Hidden Markov model (HMM) assessed using profile comparer (PRC) [16]

Read more

Summary

Introduction

CATH-Gene3D, established in the mid 1990s, is a publiclyaccessible, online resource providing a protein domain structure classification [1] (http://www.cathdb.info). Additional protein domain sequences with no known structure are identified from UniProtKB [3] and Ensembl [4] protein sequences and classified within our sister resource Gene3D [5]. The CATH-Gene3D resource is contributing to a current ELIXIR research programme, EXCELERATE, in which the CATH-Gene3D HMMs are being used to assign domain structure and function annotations for metagenome sequences in the marine metagenome data use case.

Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call