Abstract

BackgroundThere are many different types of microRNAs (miRNAs) and elucidating their functions is still under intensive research. A fundamental step in functional annotation of a new miRNA is to classify it into characterized miRNA families, such as those in Rfam and miRBase. With the accumulation of annotated miRNAs, it becomes possible to use deep learning-based models to classify different types of miRNAs. In this work, we investigate several key issues associated with successful application of deep learning models for miRNA classification. First, as secondary structure conservation is a prominent feature for noncoding RNAs including miRNAs, we examine whether secondary structure-based encoding improves classification accuracy. Second, as there are many more non-miRNA sequences than miRNAs, instead of assigning a negative class for all non-miRNA sequences, we test whether using softmax output can distinguish in-distribution and out-of-distribution samples. Finally, we investigate whether deep learning models can correctly classify sequences from small miRNA families.ResultsWe present our trained convolutional neural network (CNN) models for classifying miRNAs using different types of feature learning and encoding methods. In the first method, we explicitly encode the predicted secondary structure in a matrix. In the second method, we use only the primary sequence information and one-hot encoding matrix. In addition, in order to reject sequences that should not be classified into targeted miRNA families, we use a threshold derived from softmax layer to exclude out-of-distribution sequences, which is an important feature to make this model useful for real transcriptomic data. The comparison with the state-of-the-art ncRNA classification tools such as Infernal shows that our method can achieve comparable sensitivity and accuracy while being significantly faster.ConclusionAutomatic feature learning in CNN can lead to better classification accuracy and sensitivity for miRNA classification and annotation. The trained models and also associated codes are freely available at https://github.com/HubertTang/DeepMir.

Highlights

  • Introduction many smallnoncoding RNA (ncRNA) play important roles in gene regula-Non-coding RNAs refer to the RNAs that tion

  • This work is mainly concerned with a type of small do not encode proteins and function directly as RNAs. ncRNA, microRNA, which act as key regulators

  • We explore whether using convolutional neural network (CNN) has advantages in distinguishing different types of miRNAs over powerful covariance models

Read more

Summary

Introduction

Introduction many smallncRNAs play important roles in gene regula-Non-coding RNAs (ncRNAs) refer to the RNAs that tion. This work is mainly concerned with a type of small do not encode proteins and function directly as RNAs. ncRNA, microRNA (miRNA), which act as key regulators. Genome annotation of many different genomes show that of gene expression at post-transcriptional level in differncRNAs are ubiquitous and have various important func- ent species [2,3,4,5]. As an miRNA can bind to multiple mRNA transcripts, a large number of protein-coding genes can be regulated by miRNAs [6, 7]. There are many different types of microRNAs (miRNAs) and elucidating their functions is still under intensive research. With the accumulation of annotated miRNAs, it becomes possible to use deep learning-based models to classify different types of miRNAs. In this work, we investigate several key issues associated with successful application of deep learning models for miRNA classification. We investigate whether deep learning models can correctly classify sequences from small miRNA families

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call