Abstract

BackgroundRNA inverse folding is the problem of finding one or more sequences that fold into a user-specified target structure s 0, i.e. whose minimum free energy secondary structure is identical to the target s 0. Here we consider the ensemble of all RNA sequences that have low free energy with respect to a given target s 0.ResultsWe introduce the program RNAdualPF, which computes the dual partition function Z ∗, defined as the sum of Boltzmann factors exp(−E(a,s 0)/RT) of all RNA nucleotide sequences a compatible with target structure s 0. Using RNAdualPF, we efficiently sample RNA sequences that approximately fold into s 0, where additionally the user can specify IUPAC sequence constraints at certain positions, and whether to include dangles (energy terms for stacked, single-stranded nucleotides). Moreover, since we also compute the dual partition function Z ∗(k) over all sequences having GC-content k, the user can require that all sampled sequences have a precise, specified GC-content.Using Z ∗, we compute the dual expected energy 〈E ∗〉, and use it to show that natural RNAs from the Rfam 12.0 database have higher minimum free energy than expected, thus suggesting that functional RNAs are under evolutionary pressure to be only marginally thermodynamically stable.We show that C. elegans precursor microRNA (pre-miRNA) is significantly non-robust with respect to mutations, by comparing the robustness of each wild type pre-miRNA sequence with 2000 [resp. 500] sequences of the same GC-content generated by RNAdualPF, which approximately [resp. exactly] fold into the wild type target structure. We confirm and strengthen earlier findings that precursor microRNAs and bacterial small noncoding RNAs display plasticity, a measure of structural diversity.ConclusionWe describe RNAdualPF, which rapidly computes the dual partition function Z ∗ and samples sequences having low energy with respect to a target structure, allowing sequence constraints and specified GC-content. Using different inverse folding software, another group had earlier shown that pre-miRNA is mutationally robust, even controlling for compositional bias. Our opposite conclusion suggests a cautionary note that computationally based insights into molecular evolution may heavily depend on the software used.C/C++-software for RNAdualPF is available at http://bioinformatics.bc.edu/clotelab/RNAdualPF.Electronic supplementary materialThe online version of this article (doi:10.1186/s12859-016-1280-6) contains supplementary material, which is available to authorized users.

Highlights

  • RNA inverse folding is the problem of finding one or more sequences that fold into a user-specified target structure s0, i.e. whose minimum free energy secondary structure is identical to the target s0

  • Given a target secondary structure s0, we describe below an algorithm to compute the dual partition function Z∗(s0), defined as the sum of all Boltzmann factors exp(−E(a, s0)), where the sum is taken over all RNA sequences a ∈ AA(s0)

  • An additional challenge of computing the dual partition function with GC-content control is the combinatorial problem of efficiently counting the number N of instantiations of the external loop, consisting of all positions external to every base pair, with GC-content k, where the user can stipulate that certain positions are constrained to contain nucleotides consistent with IUPAC codes

Read more

Summary

Background

An RNA sequence a is defined to be robust if η(a) is greater than the average neutrality of 1000 control sequences generated by the program RNAinverse [2], which fold into the same target structure s0. We describe the algorithm RNAdualPF, which generates sequences which have low free energy with respect to a user-specified target structure s0 – i.e. the inherent bias of RNAdualPF is known, unlike the situation of other inverse folding algorithms. We describe the efficient software RNAdualPF to compute the dual partition function Z∗ and to sample from the low energy ensemble of sequences that are compatible with a given secondary structure s0. Given a target secondary structure s0, we describe below an algorithm to compute the dual partition function Z∗(s0), defined as the sum of all Boltzmann factors exp(−E(a, s0)), where the sum is taken over all RNA sequences a ∈ AA(s0). Since AU-base pairs that close a loop are energetically unfavorable, in the Turner energy model, there is an AUpenalty we define:

10 Triloop
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call