Abstract

Bacterial small non-coding RNAs (sRNAs) are not translated into proteins, but act as functional RNAs. They are involved in diverse biological processes like virulence, stress response and quorum sensing. Several high-throughput techniques have enabled identification of sRNAs in bacteria, but experimental detection remains a challenge and grossly incomplete for most species. Thus, there is a need to develop computational tools to predict bacterial sRNAs. Here, we propose a computational method to identify sRNAs in bacteria using support vector machine (SVM) classifier. The primary sequence and secondary structure features of experimentally-validated sRNAs of Salmonella Typhimurium LT2 (SLT2) was used to build the optimal SVM model. We found that a tri-nucleotide composition feature of sRNAs achieved an accuracy of 88.35% for SLT2. We validated the SVM model also on the experimentally-detected sRNAs of E. coli and Salmonella Typhi. The proposed model had robustly attained an accuracy of 81.25% and 88.82% for E. coli K-12 and S. Typhi Ty2, respectively. We confirmed that this method significantly improved the identification of sRNAs in bacteria. Furthermore, we used a sliding window-based method and identified sRNAs from complete genomes of SLT2, S. Typhi Ty2 and E. coli K-12 with sensitivities of 89.09%, 83.33% and 67.39%, respectively.

Highlights

  • Bacterial small non-coding RNAs are transcripts that instead of encoding proteins, function directly at the level of RNA in the cells[1,2]

  • RNA, coding RNA (COD) and or something else (OTH) models assumed that mutation pattern is significantly conserved in homologous RNA secondary structures, aligned sequences encode homologous proteins and mutations occur in simple position-independent manner, respectively

  • The experimental techniques are ideal for the identification of small non-coding RNAs (sRNAs) and exploration of their role in individual species

Read more

Summary

Introduction

Bacterial small non-coding RNAs (sRNAs) are transcripts that instead of encoding proteins, function directly at the level of RNA in the cells[1,2]. They are usually 50–250 nucleotides in length. QRNA11 used pairwise alignments to identify novel sRNAs in bacteria This technique employed a pair hidden Markov models (pair-HMMs) and a pair stochastic context-free grammar (pair-SCFG) to find structured RNA (RNA), coding RNA (COD) or something else (OTH). ZMFold[14] offered a new shuffling program in perl (shuffle-pair.pl) for pairwise alignments that simultaneously preserves key features of the alignment It used alignment dataset along with real and shuffled genomic sequences as inputs for a panel of published tools to identify novel non-coding RNA. Nucleotides composition were significantly higher in negative (−ve) set rather positive (+ve) set

Objectives
Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.