Abstract

Noncoding RNAs (ncRNAs) play important regulatory and functional roles in microorganisms, such as regulation of gene expression, signaling, protein synthesis, and RNA processing. Hence, their classification and quantification are central tasks toward the understanding of the function of the microbial community. However, the majority of the current metagenomic sequencing technologies generate short reads, which may contain only a partial secondary structure that complicates ncRNA homology detection. Meanwhile, de novo assembly of the metagenomic sequencing data remains challenging for complex communities. To tackle these challenges, we developed a novel algorithm called DRAGoM (Detection of RNA using Assembly Graph from Metagenomic data). DRAGoM first constructs a hybrid graph by merging an assembly string graph and an assembly de Bruijn graph. Then, it classifies paths in the hybrid graph and their constituent readsinto differentncRNA families based on both sequence and structural homology. Our benchmark experiments show that DRAGoMcan improve the performance and robustness over traditional approaches on the classification and quantification of a wide class of ncRNA families.

Highlights

  • Noncoding RNAs can perform versatile functional roles and their importance in cellular physiology is being increasingly recognized

  • For 16S rRNA, DRAGoM had the best performance in F-score (96.4%, Table 3) but the second-best performance in area under the curve (AUC) (96.8%, compared to the best performance of 97.6% made by CMSearch)

  • We have demonstrated using benchmark data that DRAGoM can improve ncRNA homology search as compared to the traditional read-based and assembly-based strategies

Read more

Summary

Introduction

Noncoding RNAs (ncRNAs) can perform versatile functional roles and their importance in cellular physiology is being increasingly recognized. The amazing richness of microbial genomic data renders a great opportunity to study ncRNA. The diversity and richness of microbial ncRNA function revealed from analyzing metagenomic data are beyond our existing knowledge (Weinberg et al, 2010; Nawrocki and Eddy, 2013a; Tobar-Tosse et al, 2013; Stav et al, 2019), including many long ncRNA classes such as OLE, GOLLD, and HEARO (Harris and Breaker, 2018). The discoveries underpin the importance of ncRNA functions in bacterial physiology, ecology, and interaction with the environment

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call