Abstract

The number of transcriptomic sequencing projects of various non-model organisms is still accumulating rapidly. As non-coding RNAs (ncRNAs) are highly abundant in living organism and play important roles in many biological processes, identifying fragmentary members of ncRNAs in small RNA-seq data is an important step in post-NGS analysis. However, the state-of-the-art ncRNA search tools are not optimized for next-generation sequencing (NGS) data, especially for very short reads. In this work, we propose and implement a comprehensive ncRNA classification tool (RNA-CODE) for very short reads. RNA-CODE is specifically designed for ncRNA identification in NGS data that lack quality reference genomes. Given a set of short reads, our tool classifies the reads into different types of ncRNA families. The classification results can be used to quantify the expression levels of different types of ncRNAs in RNA-seq data and ncRNA composition profiles in metagenomic data, respectively. The experimental results of applying RNA-CODE to RNA-seq of Arabidopsis and a metagenomic data set sampled from human guts demonstrate that RNA-CODE competes favorably in both sensitivity and specificity with other tools. The source codes of RNA-CODE can be downloaded at http://www.cse.msu.edu/~chengy/RNA_CODE.

Highlights

  • Noncoding RNAs, which function directly as RNAs without translating into proteins, play diverse and crucial roles in many biochemical processes

  • We introduce a comprehensive noncoding RNAs (ncRNAs) classification tool for short reads: RNA-CODE, which is designed for ncRNA identification in next-generation sequencing (NGS) data sets that lack reference genomes

  • We tested RNA-CODE on annotating reads sequenced from different ncRNA genes including house-keeping RNAs, miRNAs etc. in RNA-seq data of the model organism Arabidopsis Thaliana

Read more

Summary

Introduction

Noncoding RNAs (ncRNAs), which function directly as RNAs without translating into proteins, play diverse and crucial roles in many biochemical processes. The development of next-generation sequencing (NGS) technologies sheds light on more sensitive and comprehensive ncRNA annotation. Deep sequencing of transcriptomes of various organisms has revealed that a large portion of transcriptomic data cannot be mapped back to annotated protein-coding genes in the reference genome, indicating that those transcripts may contain transcribed ncRNAs [3]. Identifying different types of ncRNAs and quantifying their expression levels in different tissues, conditions, and developmental stages have generated new knowledge about functions of ncRNAs. Besides RNA-seq data, ncRNA identification is important for analyzing metagenomic data, which contain sequenced metagenomes from various environmental samples. NCRNA annotation is, an important component in post-NGS analysis

Methods
Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.