Abstract

Homologous non-coding RNAs frequently exhibit domain insertions, where a branch of secondary structure is inserted in a sequence with respect to its homologs. Dynamic programming algorithms for common secondary structure prediction of multiple RNA homologs, however, do not account for these domain insertions. This paper introduces a novel dynamic programming algorithm methodology that explicitly accounts for the possibility of inserted domains when predicting common RNA secondary structures. The algorithm is implemented as Dynalign II, an update to the Dynalign software package for predicting the common secondary structure of two RNA homologs. This update is accomplished with negligible increase in computational cost. Benchmarks on ncRNA families with domain insertions validate the method. Over base pairs occurring in inserted domains, Dynalign II improves accuracy over Dynalign, attaining 80.8% sensitivity (compared with 14.4% for Dynalign) and 91.4% positive predictive value (PPV) for tRNA; 66.5% sensitivity (compared with 38.9% for Dynalign) and 57.0% PPV for RNase P RNA; and 50.1% sensitivity (compared with 24.3% for Dynalign) and 58.5% PPV for SRP RNA. Compared with Dynalign, Dynalign II also exhibits statistically significant improvements in overall sensitivity and PPV. Dynalign II is available as a component of RNAstructure, which can be downloaded from http://rna.urmc.rochester.edu/RNAstructure.html.

Highlights

  • In the past three decades, RNA has been studied not just for its role in protein synthesis, and for its large number of non-coding roles, where RNA directly controls cellular function [1,2,3,4,5,6]

  • An inserted domain is a subsequence inserted in one homolog relative to one or more homologs that forms a substructure with base pairing between nucleotides that are within the inserted subsequence

  • The developed methodology is validated by benchmarking Dynalign II on non-coding RNAs (ncRNAs) families that exhibit domain insertions and other structural variations, tRNA, RNase P RNA and SRP RNA

Read more

Summary

INTRODUCTION

In the past three decades, RNA has been studied not just for its role in protein synthesis, and for its large number of non-coding roles, where RNA directly controls cellular function [1,2,3,4,5,6]. Other barriers include variation of helix and loop length and base pair opening caused by nucleotide mutations between homologous sequences. This paper describes a novel technique that allows and accounts for domain insertions in prediction of conserved structures for two unaligned sequences. In addition to domain insertions, Dynalign II accommodates other types of structural variations, base pair openings and stem extensions. The updates to Dynalign handle these structural variations with negligible increase in computational cost by using pre-computed values for the G◦ for inserted domains, obtained from single sequence folding of each homolog. The developed methodology is validated by benchmarking Dynalign II on ncRNA families that exhibit domain insertions and other structural variations, tRNA, RNase P RNA and SRP RNA. The Discussion section closes the paper with concluding remarks and a summary

MATERIALS AND METHODS
Evaluation
RESULTS
30 U G GU CU GGC
A C CC C G 200 CA
DISCUSSION
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call