Abstract

Self-contained structured domains of RNA sequences have often distinct molecular functions. Determining the boundaries of structured domains of a non-coding RNA (ncRNA) is needed for many ncRNA gene finder programs that predict RNA secondary structures in aligned genomes because these methods do not necessarily provide precise information about the boundaries or the location of the RNA structure inside the predicted ncRNA. Even without having a structure prediction, it is of interest to search for structured domains, such as for finding common RNA motifs in RNA-protein binding assays. The precise definition of the boundaries are essential for downstream analyses such as RNA structure modelling, e.g., through covariance models, and RNA structure clustering for the search of common motifs. Such efforts have so far been focused on single sequences, thus here we present a comparison for boundary definition between single sequence and multiple sequence alignments. We also present a novel approach, named RNAbound, for finding the boundaries that are based on probabilities of evolutionarily conserved base pairings. We tested the performance of two different methods on a limited number of Rfam families using the annotated structured RNA regions in the human genome and their multiple sequence alignments created from 14 species. The results show that multiple sequence alignments improve the boundary prediction for branched structures compared to single sequences independent of the chosen method. The actual performance of the two methods differs on single hairpin structures and branched structures. For the RNA families with branched structures, including transfer RNA (tRNA) and small nucleolar RNAs (snoRNAs), RNAbound improves the boundary predictions using multiple sequence alignments to median differences of −6 and −11.5 nucleotides (nts) for left and right boundary, respectively (window size of 200 nts).

Highlights

  • The function of RNA is often guided by its structural conformation, which is in turn determined by its sequence composition

  • The results show that multiple sequence alignments improve the boundary prediction for branched structures compared to single sequences independent of the chosen method

  • The three components can be described by scores based on base pairing probabilities: (1) score I bp is the geometric mean of paired probabilities between bases inside [k, l ]; (2) score O¬bp is the geometric mean of unpaired probabilities of bases inside [k, l ] to bases outside of [k, l ]; and (3) score

Read more

Summary

Introduction

The function of RNA is often guided by its structural conformation, which is in turn determined by its sequence composition. Long non-coding RNAs (lncRNAs) can contain local functional structures, e.g., lncRNA GAS5 forms a secondary structure that binds the. Defining the RNA structure domains has been addressed at the single sequence level, first explicitly by Dotu et al [2]. They described a fitness function for all segmentations of subwords of a sequence based on the base pairing probability matrix. These matrices are usually calculated from the respective sequence by McCaskill’s partition function approach [3].

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.