Abstract
BackgroundThe computational identification of RNAs in genomic sequences requires the identification of signals of RNA sequences. Shannon base pairing entropy is an indicator for RNA secondary structure fold certainty in detection of structural, non-coding RNAs (ncRNAs). Under the Boltzmann ensemble of secondary structures, the probability of a base pair is estimated from its frequency across all the alternative equilibrium structures. However, such an entropy has yet to deliver the desired performance for distinguishing ncRNAs from random sequences. Developing novel methods to improve the entropy measure performance may result in more effective ncRNA gene finding based on structure detection.ResultsThis paper shows that the measuring performance of base pairing entropy can be significantly improved with a constrained secondary structure ensemble in which only canonical base pairs are assumed to occur in energetically stable stems in a fold. This constraint actually reduces the space of the secondary structure and may lower the probabilities of base pairs unfavorable to the native fold. Indeed, base pairing entropies computed with this constrained model demonstrate substantially narrowed gaps of Z-scores between ncRNAs, as well as drastic increases in the Z-score for all 13 tested ncRNA sets, compared to shuffled sequences.ConclusionsThese results suggest the viability of developing effective structure-based ncRNA gene finding methods by investigating secondary structure ensembles of ncRNAs.
Highlights
The computational identification of RNAs in genomic sequences requires the identification of signals of RNA sequences
The entropy ∑pi,j log pi,j of base pairings between all bases i and j can be calculated based on the partition function for the Boltzmann secondary structure ensemble, which is the space of all alternative secondary structures of a given sequence; the probability pi,j is calculated as the total of Boltzmann factors over all equilibrium alternative structures that contain the base pair (i, j) [17]
The results from using these datasets were analyzed with 6 different types of measures, including Z-score and p-value of minimal free energy (MFE), and Shannon base pairing entropy [18], in comparisons with random sequences
Summary
The computational identification of RNAs in genomic sequences requires the identification of signals of RNA sequences. Shannon base pairing entropy is an indicator for RNA secondary structure fold certainty in detection of structural, non-coding RNAs (ncRNAs). Under the Boltzmann ensemble of secondary structures, the probability of a base pair is estimated from its frequency across all the alternative equilibrium structures Such an entropy has yet to deliver the desired performance for distinguishing ncRNAs from random sequences. The possibility that folded secondary structure may lead to successful ab initio ncRNA gene prediction methods has energized leading groups to independently develop structure-based ncRNA gene finding methods [7,8]. Both measures perform impressively on precursor miRNAs but not as well on tRNAs and some rRNAs [14,18]
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.