A Computational Pipeline for High- Throughput Discovery of cis-Regulatory Noncoding RNA in Prokaryotes

Zizhen Yao,Ronald Breaker,Walter L Ruzzo,Martin Tompa,Jeffrey Barrick,Zasha Weinberg,Shane Neph

doi:10.1371/journal.pcbi.0030126

Zizhen Yao, Ronald Breaker + Show 5 more

Open Access

https://doi.org/10.1371/journal.pcbi.0030126

Copy DOI

Abstract

Noncoding RNAs (ncRNAs) are important functional RNAs that do not code for proteins. We present a highly efficient computational pipeline for discovering cis-regulatory ncRNA motifs de novo. The pipeline differs from previous methods in that it is structure-oriented, does not require a multiple-sequence alignment as input, and is capable of detecting RNA motifs with low sequence conservation. We also integrate RNA motif prediction with RNA homolog search, which improves the quality of the RNA motifs significantly. Here, we report the results of applying this pipeline to Firmicute bacteria. Our top-ranking motifs include most known Firmicute elements found in the RNA family database (Rfam). Comparing our motif models with Rfam's hand-curated motif models, we achieve high accuracy in both membership prediction and base-pair–level secondary structure prediction (at least 75% average sensitivity and specificity on both tasks). Of the ncRNA candidates not in Rfam, we find compelling evidence that some of them are functional, and analyze several potential ribosomal protein leaders in depth.

Highlights

Recent discoveries of novel noncoding RNAs such as microRNAs and riboswitches suggest that ncRNAs have important and diverse functional and regulatory roles that impact gene transcription, translation, localization, replication, and degradation [1,2,3]
Our pipeline consists of the following major steps. (See Figure 1, Materials and Methods, and the online supplement at http://bio.cs.washington.edu/supplements/ yzizhen/pipeline for more details.) First, we used the National Center for Biotechnology Information’s (NCBI’s) Conserved Domain Database (CDD) [16] to identify homologous gene sets
Positive Controls: Discovering Known RNA family database (Rfam) Families To roughly assess the sensitivity with which the method discovers true ncRNAs, we looked at its recovery of known Rfam families

Summary

Introduction

Recent discoveries of novel noncoding RNAs (ncRNAs) such as microRNAs and riboswitches suggest that ncRNAs have important and diverse functional and regulatory roles that impact gene transcription, translation, localization, replication, and degradation [1,2,3]. More recent work has extended these searches to eukaryotes [9,10,11,12,13], discovering a large number of known microRNAs while producing thousands of novel ncRNA candidates With some exceptions, such as [4] and [13], these approaches follow a similar paradigm, which is to search for conserved secondary structures on multiplesequence alignments that are constructed based on sequence similarity alone. These schemes use measures such as mutual information between pairs of alignment columns to signal base-paired regions. Even local misalignments may weaken this key structural signal, making the methods sensitive to alignment quality, which is especially problematic on diverged sequences

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: PLoS Computational Biology	Publication Date: Jul 1, 2007
Citations: 130	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

A Computational Pipeline for High- Throughput Discovery of cis-Regulatory Noncoding RNA in Prokaryotes

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PLoS Computational Biology

Lead the way for us

Similar Papers

A Structure-Based Flexible Search Method for Motifs in RNA
Isana Veksler-Lublinsky ... Klara Kedem
Journal of Computational Biology | VOL. 14
Isana Veksler-Lublinsky, et. al.Isana Veksler-Lublinsky ... Klara Kedem
01 Sep 2007
Journal of Computational Biology | VOL. 14

Discovering cis-Regulatory RNAs in Shewanella Genomes by Support Vector Machines
Xing Xu ... Gary D Stormo
PLoS Computational Biology | VOL. 5
Xing Xu, et. al.Xing Xu ... Gary D Stormo
03 Apr 2009
PLoS Computational Biology | VOL. 5

Comparative genomics of metabolic capacities of regulons controlled by cis-regulatory RNA motifs in bacteria
Eric I Sun ... Semen A Leyn
BMC Genomics | VOL. 14
Eric I Sun, et. al.Eric I Sun ... Semen A Leyn
02 Sep 2013
BMC Genomics | VOL. 14

Prediction of secondary structural content of proteins from their amino acid composition alone. II. The paradox with secondary structural class.
Frank Eisenhaber ... Cornelius Frömmel
Proteins | VOL. 25
Frank Eisenhaber, et. al.Frank Eisenhaber ... Cornelius Frömmel
01 Jun 1996
Proteins | VOL. 25

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A Computational Pipeline for High- Throughput Discovery of cis-Regulatory Noncoding RNA in Prokaryotes

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: PLoS Computational Biology