Abstract
The RFAM database defines families of ncRNAs by means of sequence similarities that are sufficient to establish homology. In some cases, such as microRNAs and box H/ACA snoRNAs, functional commonalities define classes of RNAs that are characterized by structural similarities, and typically consist of multiple RNA families. Recent advances in high-throughput transcriptomics and comparative genomics have produced very large sets of putative noncoding RNAs and regulatory RNA signals. For many of them, evidence for stabilizing selection acting on their secondary structures has been derived, and at least approximate models of their structures have been computed. The overwhelming majority of these hypothetical RNAs cannot be assigned to established families or classes. We present here a structure-based clustering approach that is capable of extracting putative RNA classes from genome-wide surveys for structured RNAs. The LocARNA (local alignment of RNA) tool implements a novel variant of the Sankoff algorithm that is sufficiently fast to deal with several thousand candidate sequences. The method is also robust against false positive predictions, i.e., a contamination of the input data with unstructured or nonconserved sequences. We have successfully tested the LocARNA-based clustering approach on the sequences of the RFAM-seed alignments. Furthermore, we have applied it to a previously published set of 3,332 predicted structured elements in the Ciona intestinalis genome (Missal K, Rose D, Stadler PF (2005) Noncoding RNAs in Ciona intestinalis. Bioinformatics 21 (Supplement 2): i77–i78). In addition to recovering, e.g., tRNAs as a structure-based class, the method identifies several RNA families, including microRNA and snoRNA candidates, and suggests several novel classes of ncRNAs for which to date no representative has been experimentally characterized.
Highlights
Starting with the discovery of microRNAs [1,2,3] and the advent of genome-wide transcriptomics [4,5,6], it has become obvious that RNA plays a large variety of important, often regulatory, roles in living organisms that extend far beyond being a mere intermediate one in protein biosynthesis
It was believed that the control of processes in living organisms is almost only performed by proteins
Evaluation of the Clustering Procedure To evaluate the quality of our clustering approach, we have applied our procedure to the sequences in the RFAM seed alignments
Summary
Starting with the discovery of microRNAs [1,2,3] and the advent of genome-wide transcriptomics [4,5,6], it has become obvious that RNA plays a large variety of important, often regulatory, roles in living organisms that extend far beyond being a mere intermediate one in protein biosynthesis. EvoFold [10] and RNAz [9,13], are efficient enough to be applied to genome-wide surveys in mammals [10,13] and other metazoan clades [14,15] Both approaches start from multiple sequence alignments. While EvoFold uses the SCFG approach pioneered by qrna [7], RNAz is based on evaluating the folding thermodynamics. Both approaches classify input alignments either as unstructured or as possessing a common RNA secondary structure; in the latter case they provide a prediction for the consensus structure of the aligned sequences
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.