An efficient genetic algorithm for structural RNA pairwise alignment and its application to non-coding RNA discovery in yeast

Akito Taneda

doi:10.1186/1471-2105-9-521

Abstract

BackgroundAligning RNA sequences with low sequence identity has been a challenging problem since such a computation essentially needs an algorithm with high complexities for taking structural conservation into account. Although many sophisticated algorithms for the purpose have been proposed to date, further improvement in efficiency is necessary to accelerate its large-scale applications including non-coding RNA (ncRNA) discovery.ResultsWe developed a new genetic algorithm, Cofolga2, for simultaneously computing pairwise RNA sequence alignment and consensus folding, and benchmarked it using BRAliBase 2.1. The benchmark results showed that our new algorithm is accurate and efficient in both time and memory usage. Then, combining with the originally trained SVM, we applied the new algorithm to novel ncRNA discovery where we compared S. cerevisiae genome with six related genomes in a pairwise manner. By focusing our search to the relatively short regions (50 bp to 2,000 bp) sandwiched by conserved sequences, we successfully predict 714 intergenic and 1,311 sense or antisense ncRNA candidates, which were found in the pairwise alignments with stable consensus secondary structure and low sequence identity (≤ 50%). By comparing with the previous predictions, we found that > 92% of the candidates is novel candidates. The estimated rate of false positives in the predicted candidates is 51%. Twenty-five percent of the intergenic candidates has supports for expression in cell, i.e. their genomic positions overlap those of the experimentally determined transcripts in literature. By manual inspection of the results, moreover, we obtained four multiple alignments with low sequence identity which reveal consensus structures shared by three species/sequences.ConclusionThe present method gives an efficient tool complementary to sequence-alignment-based ncRNA finders.

Highlights

Aligning RNA sequences with low sequence identity has been a challenging problem since such a computation essentially needs an algorithm with high complexities for taking structural conservation into account
Pairwise comparison of genomic sequences To efficiently search for non-coding RNA (ncRNA) candidates with low sequence identity, we focused on our scan to the relatively short (50 bp to 2,000 bp) low-identity regions located between two regions which are conserved at sequence level
The new genetic algorithm (GA) is accurate and efficient in both time and memory usage, we applied it to the comparative ncRNA discovery between S. cerevisiae and related species using a SVM trained with the sequences and alignments taken from BRAliBase 2.1

Summary

Introduction

Aligning RNA sequences with low sequence identity has been a challenging problem since such a computation essentially needs an algorithm with high complexities for taking structural conservation into account. Sequence-alignmentbased ncRNA finders such as RNAz [3], QRNA [4] and EvoFold [5] have been successfully applied to ncRNA discoveries from various complete genomes [6,7,8,9,10] While these methods are so efficient that they can be applied to genome-scale analysis, sequence-alignment-based methods need a pre-computed alignment as an input data. In other words, they implicitly assume that an adequately accurate RNA sequence alignment can be obtained by using pure sequence alignment method (e.g. ClustalW) which does not explicitly consider conserved secondary structure. This assumption is acceptable for the RNA sequences with relatively high sequence identity, sequence-alignment-based methods can fail to indentify the ncRNAs with low sequence identity; this is because conserved secondary structure should be taken into account to accurately align structured RNA sequences which are poorly conserved at sequence level

Methods

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC Bioinformatics	Publication Date: Dec 1, 2008
Citations: 49	License type: CC BY 2.0

R Discovery Prime

R Discovery Prime

An efficient genetic algorithm for structural RNA pairwise alignment and its application to non-coding RNA discovery in yeast

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics

Lead the way for us

Similar Papers

Chapter 5 - Using Genetic Algorithms for Pairwise and Multiple Sequence Alignments
Cédric Notredame
Evolutionary Computation in Bioinformatics | VOL. -
Cédric NotredameCédric Notredame
01 Jan 2003
Evolutionary Computation in Bioinformatics | VOL. -

TOPAS: network-based structural alignment of RNA sequences.
Chun-Chi Chen ... Xiaoning Qian
Bioinformatics (Oxford, England) | VOL. 35
Chun-Chi Chen, et. al.Chun-Chi Chen ... Xiaoning Qian
10 Jan 2019
Bioinformatics (Oxford, England) | VOL. 35

Improved accuracy of multiple ncRNA alignment by incorporating structural information into a MAFFT-based framework
Kazutaka Katoh ... Hiroyuki Toh
BMC Bioinformatics | VOL. 9
Kazutaka Katoh, et. al.Kazutaka Katoh ... Hiroyuki Toh
25 Apr 2008
BMC Bioinformatics | VOL. 9

Network-Based RNA Structural Alignment Through Optimal Local Neighborhood Matching
Hyun-Myung Woo ... Byung-Jun Yoon
-
Hyun-Myung Woo, et. al.Hyun-Myung Woo ... Byung-Jun Yoon
01 Nov 2020
01 Nov 2020

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

An efficient genetic algorithm for structural RNA pairwise alignment and its application to non-coding RNA discovery in yeast

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics