Abstract
BackgroundThe sequence of nucleotides in an RNA determines the possible base pairs for an RNA fold and thus also determines the overall shape and function of an RNA. The Swellix program presented here combines a helix abstraction with a combinatorial approach to the RNA folding problem in order to compute all possible non-pseudoknotted RNA structures for RNA sequences. The Swellix program builds on the Crumple program and can include experimental constraints on global RNA structures such as the minimum number and lengths of helices from crystallography, cryoelectron microscopy, or in vivo crosslinking and chemical probing methods.ResultsThe conceptual advance in Swellix is to count helices and generate all possible combinations of helices rather than counting and combining base pairs. Swellix bundles similar helices and includes improvements in memory use and efficient parallelization. Biological applications of Swellix are demonstrated by computing the reduction in conformational space and entropy due to naturally modified nucleotides in tRNA sequences and by motif searches in Human Endogenous Retroviral (HERV) RNA sequences. The Swellix motif search reveals occurrences of protein and drug binding motifs in the HERV RNA ensemble that do not occur in minimum free energy or centroid predicted structures.ConclusionsSwellix presents significant improvements over Crumple in terms of efficiency and memory use. The efficient parallelization of Swellix enables the computation of sequences as long as 418 nucleotides with sufficient experimental constraints. Thus, Swellix provides a practical alternative to free energy minimization tools when multiple structures, kinetically determined structures, or complex RNA-RNA and RNA-protein interactions are present in an RNA folding problem.
Highlights
The sequence of nucleotides in an RNA determines the possible base pairs for an RNA fold and determines the overall shape and function of an RNA
Approximately 80% of the human genome is transcribed into an RNA sequence, only 2% of the genome codes for proteins [1]
The flood of RNA sequence information from generation high-throughput sequencing technology and the explosion of discoveries for non-coding RNA create an enormous need for RNA structure prediction tools
Summary
80% of the human genome is transcribed into an RNA sequence, only 2% of the genome codes for proteins [1] This discovery reveals the abundance of noncoding RNA with as yet undetermined function. RNA structure prediction tools form a key component in many genome-wide RNA analysis pipelines [2,3,4,5] Many of these new RNA discoveries reveal RNA sequences with multiple functional folds or partially unfolded RNA [2, 4, 6, 7]. This paper presents a new computational method, Swellix, that computes efficiently all possible non-pseudoknotted structures for an RNA sequence by counting helices rather than base pairs. Swellix counts RNA motif frequency, and provides insight into possible functional interactions that may not be present in low-energy structure predictions
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.