OCPAT: an online codon-preserved alignment tool for evolutionary genomic analysis of protein coding sequences.

Guozhen Liu,Munirul Islam,Lawrence I Grossman,Derek E Wildman,Monica Uddin,Roberto Romero,Morris Goodman

doi:10.1186/1751-0473-2-5

Abstract

BackgroundRapidly accumulating genome sequence data from multiple species offer powerful opportunities for the detection of DNA sequence evolution. Phylogenetic tree construction and codon-based tests for natural selection are the prevailing tools used to detect functionally important evolutionary change in protein coding sequences. These analyses often require multiple DNA sequence alignments that maintain the correct reading frame for each collection of putative orthologous sequences. Since this feature is not available in most alignment tools, codon reading frames often must be checked manually before evolutionary analyses can commence.ResultsHere we report an online codon-preserved alignment tool (OCPAT) that generates multiple sequence alignments automatically from the coding sequences of any list of human gene IDs and their putative orthologs from genomes of other vertebrate tetrapods. OCPAT is programmed to extract putative orthologous genes from genomes and to align the orthologs with the reading frame maintained in all species. OCPAT also optimizes the alignment by trimming the most variable alignment regions at the 5' and 3' ends of each gene. The resulting output of alignments is returned in several formats, which facilitates further molecular evolutionary analyses by appropriate available software. Alignments are generally robust and reliable, retaining the correct reading frame. The tool can serve as the first step for comparative genomic analyses of protein-coding gene sequences including phylogenetic tree reconstruction and detection of natural selection. We aligned 20,658 human RefSeq mRNAs using OCPAT. Most alignments are missing sequence(s) from at least one species; however, functional annotation clustering of the ~1700 transcripts that were alignable to all species shows that genes involved in multi-subunit protein complexes are highly conserved.ConclusionThe OCPAT program facilitates large-scale evolutionary and phylogenetic analyses of entire biological processes, pathways, and diseases.

Highlights

Accumulating genome sequence data from multiple species offer powerful opportunities for the detection of DNA sequence evolution
A necessary step in such analyses is the construction of in-frame multiple sequence alignments
Used alignment tools such as CLUSTAL and TCOFFEE [2,3] do not retain reading frame information, the achievement of in-frame alignments usually requires manual curation, which is impractical at a genome-wide scale

Summary

Results

Using OCPAT we generated 20,658 multiple sequence alignments derived from human mRNA RefSeq IDs. We found that mammal species whose genomes were sequenced at 2-fold coverage had fewer recovered orthologs than did mammals with higher quality sequences Despite these limitations, there are 1,698 human RefSeqs for which we were able to obtain putative orthologs from all taxa queried (N = 13; the platypus Genebuild was not available as of Nov. 2, 2006). To explore the biological significance of the genes found in all species we conducted a functional annotation clustering analysis using the default settings of the DAVID package [27] The results of this analysis indicated a statistically significant over-representation of genes that encode proteins found in multi-subunit complexes (n = 263 RefSeqs; p = 5.0E-39). We consider the genes with putative orthologs for all species to be a good indicator of conservation (i.e., more identifiable orthologs indicates more functional constraint on the protein) Taken together, these results suggest that protein-protein interactions in multi-subunit complexes are under considerable evolutionary constraint. Mutations in these proteins are possibly more likely to be harmful when they occur

Background

Ortholog extraction

Error correction

Determination of reading frame

Core alignment

Output

Discussion

Conclusion

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Source Code for Biology and Medicine	Publication Date: Sep 18, 2007
Citations: 21	License type: cc-by

R Discovery Prime

R Discovery Prime

OCPAT: an online codon-preserved alignment tool for evolutionary genomic analysis of protein coding sequences.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Source Code for Biology and Medicine

Lead the way for us

Similar Papers

VESPA: Very large-scale Evolutionary and Selective Pressure Analyses
Andrew E Webb ... Mary J O’Connell
PeerJ. Computer science | VOL. 3
Andrew E Webb, et. al.Andrew E Webb ... Mary J O’Connell
05 Jun 2017
PeerJ. Computer science | VOL. 3

Time-dependent ARMA modeling of genomic sequences
Jerzy S Zielinski ... Nidhal Bouaynaya
BMC bioinformatics | VOL. 9
Jerzy S Zielinski, et. al.Jerzy S Zielinski ... Nidhal Bouaynaya
01 Aug 2008
BMC bioinformatics | VOL. 9

Synonymous Substitutions Substantially Improve Evolutionary Inference from Highly Diverged Proteins
Tae-Kun Seo ... Paul Lewis
Systematic Zoology | VOL. 57
Tae-Kun Seo, et. al.Tae-Kun Seo ... Paul Lewis
01 Jun 2008
Systematic Zoology | VOL. 57

Assessing Multiple Sequence Alignments Using Visual Tools
Catherine L. ... Cory L.
-
Catherine L., et. al.Catherine L. ... Cory L.
02 Nov 2011
02 Nov 2011

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

OCPAT: an online codon-preserved alignment tool for evolutionary genomic analysis of protein coding sequences.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Source Code for Biology and Medicine