Pseudofam: the pseudogene families database

Hugo Y K Lam,Kei-Hoi Cheung,Nicholas Carriero,Philip Cayting,Ekta Khurana,Gang Fang,Mark B Gerstein

doi:10.1093/nar/gkn758

Abstract

Pseudofam (http://pseudofam.pseudogene.org) is a database of pseudogene families based on the protein families from the Pfam database. It provides resources for analyzing the family structure of pseudogenes including query tools, statistical summaries and sequence alignments. The current version of Pseudofam contains more than 125 000 pseudogenes identified from 10 eukaryotic genomes and aligned within nearly 3000 families (approximately one-third of the total families in PfamA). Pseudofam uses a large-scale parallelized homology search algorithm (implemented as an extension of the PseudoPipe pipeline) to identify pseudogenes. Each identified pseudogene is assigned to its parent protein family and subsequently aligned to each other by transferring the parent domain alignments from the Pfam family. Pseudogenes are also given additional annotation based on an ontology, reflecting their mode of creation and subsequent history. In particular, our annotation highlights the association of pseudogene families with genomic features, such as segmental duplications. In addition, pseudogene families are associated with key statistics, which identify outlier families with an unusual degree of pseudogenization. The statistics also show how the number of genes and pseudogenes in families correlates across different species. Overall, they highlight the fact that housekeeping families tend to be enriched with a large number of pseudogenes.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Nucleic Acids Research	Publication Date: Oct 28, 2008
Citations: 66	License type: CC BY-NC 2.0 UK

R Discovery Prime

R Discovery Prime

Pseudofam: the pseudogene families database

Abstract

Talk to us

Similar Papers

More From: Nucleic Acids Research

Lead the way for us

Similar Papers

Allergens are distributed into few protein families and possess a restricted number of biochemical functions
Christian Radauer ... Heimo Breiteneder
Journal of Allergy and Clinical Immunology | VOL. 121
Christian Radauer, et. al.Christian Radauer ... Heimo Breiteneder
01 Apr 2008
Journal of Allergy and Clinical Immunology | VOL. 121

SUPFAM: A database of sequence superfamilies of protein domains
Shashi B Pandit ... B Anand
BMC Bioinformatics | VOL. 5
Shashi B Pandit, et. al.Shashi B Pandit ... B Anand
01 Jan 2004
BMC Bioinformatics | VOL. 5

The size distribution of protein families within different types of folds
Xinsheng Liu ... Wanlin Guo
Biochemical and Biophysical Research Communications | VOL. 406
Xinsheng Liu, et. al.Xinsheng Liu ... Wanlin Guo
06 Feb 2011
Biochemical and Biophysical Research Communications | VOL. 406

Rapid and comprehensive discovery of unreported shellfish allergens using large-scale transcriptomic and proteomic resources
Roni Nugraha ... Andreas L Lopata
Journal of Allergy and Clinical Immunology | VOL. 141
Roni Nugraha, et. al.Roni Nugraha ... Andreas L Lopata
16 Dec 2017
Journal of Allergy and Clinical Immunology | VOL. 141

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Pseudofam: the pseudogene families database

Abstract

Talk to us

Similar Papers

More From: Nucleic Acids Research