Protein Family Databases

Nicola J Mulder

doi:10.1002/9780470015902.a0003058.pub3

Abstract

Abstract As new protein sequences continue to flood into public databases with the advancement of sequencing technologies, the importance of protein family databases for automatic protein functional classification increases. These databases are developed independently and each has its own methods and areas of interest, as well as its own strengths and weaknesses. To simplify access to multiple databases by the user, many of these databases have also been amalgamated into integrated protein family resources, which vary in their level of manual curation. These protein family databases or integrated resources have a number of applications in modern biology or bioinformatics, including protein functional annotation, orthologue prediction, protein–protein interaction prediction, gene set enrichment analysis and providing datasets for evaluation of mathematic models of biological systems or networks. Key Concepts: Protein signatures are mathematical descriptions of the sequence characteristics of members of the same protein family or domain. Profiles and hidden Markov models are tools for characterising protein families or domains. Regular expressions or patterns are used for describing short highly conserved motifs. Protein family data has a number of applications, notably for the functional classification of new protein sequences.

Full Text