Abstract
Noncoding RNAs (ncRNAs) are important functional RNAs that do not code for proteins. We present a highly efficient computational pipeline for discovering cis-regulatory ncRNA motifs de novo. The pipeline differs from previous methods in that it is structure-oriented, does not require a multiple-sequence alignment as input, and is capable of detecting RNA motifs with low sequence conservation. We also integrate RNA motif prediction with RNA homolog search, which improves the quality of the RNA motifs significantly. Here, we report the results of applying this pipeline to Firmicute bacteria. Our top-ranking motifs include most known Firmicute elements found in the RNA family database (Rfam). Comparing our motif models with Rfam's hand-curated motif models, we achieve high accuracy in both membership prediction and base-pair–level secondary structure prediction (at least 75% average sensitivity and specificity on both tasks). Of the ncRNA candidates not in Rfam, we find compelling evidence that some of them are functional, and analyze several potential ribosomal protein leaders in depth.
Highlights
Recent discoveries of novel noncoding RNAs such as microRNAs and riboswitches suggest that ncRNAs have important and diverse functional and regulatory roles that impact gene transcription, translation, localization, replication, and degradation [1,2,3]
Our pipeline consists of the following major steps. (See Figure 1, Materials and Methods, and the online supplement at http://bio.cs.washington.edu/supplements/ yzizhen/pipeline for more details.) First, we used the National Center for Biotechnology Information’s (NCBI’s) Conserved Domain Database (CDD) [16] to identify homologous gene sets
Positive Controls: Discovering Known RNA family database (Rfam) Families To roughly assess the sensitivity with which the method discovers true ncRNAs, we looked at its recovery of known Rfam families
Summary
Recent discoveries of novel noncoding RNAs (ncRNAs) such as microRNAs and riboswitches suggest that ncRNAs have important and diverse functional and regulatory roles that impact gene transcription, translation, localization, replication, and degradation [1,2,3]. More recent work has extended these searches to eukaryotes [9,10,11,12,13], discovering a large number of known microRNAs while producing thousands of novel ncRNA candidates With some exceptions, such as [4] and [13], these approaches follow a similar paradigm, which is to search for conserved secondary structures on multiplesequence alignments that are constructed based on sequence similarity alone. These schemes use measures such as mutual information between pairs of alignment columns to signal base-paired regions. Even local misalignments may weaken this key structural signal, making the methods sensitive to alignment quality, which is especially problematic on diverged sequences
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.