Abstract

BackgroundDistinction between pre-microRNAs (precursor microRNAs) and length-similar pseudo pre-microRNAs can reveal more about the regulatory mechanism of RNA biological processes. Machine learning techniques have been widely applied to deal with this challenging problem. However, most of them mainly focus on secondary structure information of pre-microRNAs, while ignoring sequence-order information and sequence evolution information.ResultsWe use new features for the machine learning algorithms to improve the classification performance by characterizing both sequence order evolution information and secondary structure graphs. We developed three steps to extract these features of pre-microRNAs. We first extract features from PSI-BLAST profiles and Hilbert-Huang transforms, which contain rich sequence evolution information and sequence-order information respectively. We then obtain properties of small molecular networks of pre-microRNAs, which contain refined secondary structure information. These structural features are carefully generated so that they can depict both global and local characteristics of pre-microRNAs. In total, our feature space covers 591 features. The maximum relevance and minimum redundancy (mRMR) feature selection method is adopted before support vector machine (SVM) is applied as our classifier. The constructed classification model is named MicroRNA −NHPred. The performance of MicroRNA −NHPred is high and stable, which is better than that of those state-of-the-art methods, achieving an accuracy of up to 94.83% on same benchmark datasets.ConclusionsThe high prediction accuracy achieved by our proposed method is attributed to the design of a comprehensive feature set on the sequences and secondary structures, which are capable of characterizing the sequence evolution information and sequence-order information, and global and local information of pre-microRNAs secondary structures. MicroRNA −NHPred is a valuable method for pre-microRNAs identification. The source codes of our method can be downloaded from https://github.com/myl446/MicroRNA-NHPred.

Highlights

  • Distinction between pre-microRNAs and length-similar pseudo pre-microRNAs can reveal more about the regulatory mechanism of RNA biological processes

  • To describe local or short-range sequence order information and evolution information of premicroRNAs, we introduce PSI-BLAST profiles into the analysis of pre-microRNAs for the first time

  • Parameter selection by maximum relevance and minimum redundancy (mRMR) We develop three steps to extract 591 features, and those features are shown in Additional file 1

Read more

Summary

Introduction

Distinction between pre-microRNAs (precursor microRNAs) and length-similar pseudo pre-microRNAs can reveal more about the regulatory mechanism of RNA biological processes. Mature microRNAs (miRNAs) are small single-stranded, non-coding RNAs (about 22 nucleotides in length), which play significant regulatory roles in various biological processes of animals, plants and viruses [1, 2]. There are two other forms of miRNAs: primary miRNAs (pri-miRNAs) and precursor microRNAs (pre-miRNAs). Precursor miRNAs have been widely studied at the earliest time, and many commercialized miRNA libraries take this form. With the advent of the post genome era and the development of sequencing technology, how to find all forms of miRNAs from millions of reads has become one of the challenging topics in bioinformatics. Current computational methods are focusing on the identification of pre-miRNAs instead of mature miRNAs

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call