Accurate discrimination of conserved coding and non-coding regions through multiple indicators of evolutionary dynamics

Matteo Rè,Graziano Pesole,David S Horner

doi:10.1186/1471-2105-10-282

Abstract

BackgroundThe conservation of sequences between related genomes has long been recognised as an indication of functional significance and recognition of sequence homology is one of the principal approaches used in the annotation of newly sequenced genomes. In the context of recent findings that the number non-coding transcripts in higher organisms is likely to be much higher than previously imagined, discrimination between conserved coding and non-coding sequences is a topic of considerable interest. Additionally, it should be considered desirable to discriminate between coding and non-coding conserved sequences without recourse to the use of sequence similarity searches of protein databases as such approaches exclude the identification of novel conserved proteins without characterized homologs and may be influenced by the presence in databases of sequences which are erroneously annotated as coding.ResultsHere we present a machine learning-based approach for the discrimination of conserved coding sequences. Our method calculates various statistics related to the evolutionary dynamics of two aligned sequences. These features are considered by a Support Vector Machine which designates the alignment coding or non-coding with an associated probability score.ConclusionWe show that our approach is both sensitive and accurate with respect to comparable methods and illustrate several situations in which it may be applied, including the identification of conserved coding regions in genome sequences and the discrimination of coding from non-coding cDNA sequences.

Highlights

The conservation of sequences between related genomes has long been recognised as an indication of functional significance and recognition of sequence homology is one of the principal approaches used in the annotation of newly sequenced genomes
Our method calculates various statistics related to the evolutionary dynamics of two aligned sequences
We show that our approach is both sensitive and accurate with respect to comparable methods and illustrate several situations in which it may be applied, including the identification of conserved coding regions in genome sequences and the discrimination of coding from non-coding cDNA sequences

Summary

Introduction

The conservation of sequences between related genomes has long been recognised as an indication of functional significance and recognition of sequence homology is one of the principal approaches used in the annotation of newly sequenced genomes. While it is probable that representatives of most gene families found in nature have been characterized (at least at the sequence level), lineage specific gene families, genes, and exons - which may be incorporated into messages by alternative splicing and which may not be recovered by ab-initio predictors as components of optimal gene models, are not uncommon (e.g.[4,5,6]). In this context, comparisons between relatively closely related genomes can permit identification of novel exons or coding genes that exhibit low levels of similarity to annotated proteins [7]. Such methods do not rely on the annotation of homologous sequences or the conservation of specific functional signals

Methods

Results

Discussion

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC Bioinformatics	Publication Date: Sep 8, 2009
Citations: 31	License type: CC BY 2.0

R Discovery Prime

R Discovery Prime

Accurate discrimination of conserved coding and non-coding regions through multiple indicators of evolutionary dynamics

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics

Lead the way for us

Similar Papers

Understanding and modeling human traits and diseases: Insights from the comparative genomics resources of Zoonomia
Maosen Ye ... Deng-Feng Zhang
The Innovation | VOL. 4
Maosen Ye, et. al.Maosen Ye ... Deng-Feng Zhang
20 May 2023
The Innovation | VOL. 4

Time-dependent ARMA modeling of genomic sequences
Jerzy S Zielinski ... Nidhal Bouaynaya
BMC Bioinformatics | VOL. 9
Jerzy S Zielinski, et. al.Jerzy S Zielinski ... Nidhal Bouaynaya
01 Aug 2008
BMC Bioinformatics | VOL. 9

Scaling behaviors of CG clusters in coding and noncoding DNA sequences
Linxi Zhang ... Jin Chen
Chaos, Solitons & Fractals | VOL. 24
Linxi Zhang, et. al.Linxi Zhang ... Jin Chen
08 Sep 2004
Chaos, Solitons & Fractals | VOL. 24

Basic concepts and potential applications of genetics and genomics for cardiovascular and stroke clinicians: a scientific statement from the American Heart Association.
Kiran Musunuru ... David M Herrington
Circulation: Cardiovascular Genetics | VOL. 8
Kiran Musunuru, et. al.Kiran Musunuru ... David M Herrington
05 Jan 2015
Circulation: Cardiovascular Genetics | VOL. 8

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Accurate discrimination of conserved coding and non-coding regions through multiple indicators of evolutionary dynamics

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics