Abstract

AbstractP-POD, the Princeton Protein Orthology Database, classifies proteins from model organisms and medically-important organisms into families of homologs and provides curated evidence from the literature addressing these relationships. The web page for each protein family includes a phylogenetic tree, sequence alignment, and cross-references to disease-related papers from SGD, papers describing complementation experiments, and OMIM gene and disease information.As participants in the Gene Ontology Consortium’s Reference Genome project, we seek to provide a consistent centralized method to identify orthologous proteins. We have expanded P-POD to include the protein complement of the twelve Reference Genomes. In addition, we have added new tools and search options to provide greater depth, breadth, and flexibility. Users may view families from multiple analyses generated by different methods and/or based on different sets of proteins. Using Notung, a software package that uses duplication-loss parsimony to resolve uncertainty in protein family trees, we have improved P-POD’s phylogenetic trees by fitting the protein trees to an established species phylogeny and annotating them with duplications and losses. A Notung applet on the P-POD web site identifies orthologous and paralogous relationships within each family and allows users to perform custom analyses on the phylogenies. These improvements and others make P-POD, in conjunction with the PANTHER database, an ideal tool for predicting the function of new, uncharacterized genes on the basis of their orthologous relationships to characterized ones.All the data in P-POD are freely and publicly available through the web and by downloading the entire database system via the URL "http://ortholog.princeton.edu/". This work is supported by supplemental funds (Kara Dolinski, subcontract PI) to NHGRI grant HG002273 (PIs Judith Blake, Michael Ashburner, J. Michael Cherry and Suzanna Lewis).

Highlights

  • P-POD, the Princeton Protein Orthology Database, provides a convenient, centralized resource to help researchers infer protein function

  • As part of a new effort for the Reference Genome Project, OrthoMCL results from P-POD are being integrated with larger PANTHER protein families to map Gene Ontology (GO) terms from annotated proteins to their unannotated homologs

  • Several new computational features assist in this effort

Read more

Summary

Direct annotation

Clades of the MSH2 superfamily, from PANTHER family PTHR11361: MutS2 clade (outgroup: bacteria, archaea, plants). Example 1: Propagation of a GO term from and to multiple members of a homolog family. Three MSH2 homologs are annotated either directly to the molecular function “mismatched DNA binding” or to one of its children. The common ancestor of these three proteins is the common ancestor of all homologs, so the term can be propagated to all family members. Eight family members already have IEA annotations to this term or a child term, but ve have none at all

Existing annotation
Curator Notes
PHYLIP to generate phylogenetic trees
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call