Abstract

BackgroundMicroRNAs (miRNAs) are a family of non-coding RNAs approximately 21 nucleotides in length that play pivotal roles at the post-transcriptional level in animals, plants and viruses. These molecules silence their target genes by degrading transcription or suppressing translation. Studies have shown that miRNAs are involved in biological responses to a variety of biotic and abiotic stresses. Identification of these molecules and their targets can aid the understanding of regulatory processes. Recently, prediction methods based on machine learning have been widely used for miRNA prediction. However, most of these methods were designed for mammalian miRNA prediction, and few are available for predicting miRNAs in the pre-miRNAs of specific plant species. Although the complete Solanum lycopersicum genome has been published, only 77 Solanum lycopersicum miRNAs have been identified, far less than the estimated number. Therefore, it is essential to develop a prediction method based on machine learning to identify new plant miRNAs.ResultsA novel classification model based on a support vector machine (SVM) was trained to identify real and pseudo plant pre-miRNAs together with their miRNAs. An initial set of 152 novel features related to sequential structures was used to train the model. By applying feature selection, we obtained the best subset of 47 features for use with the Back Support Vector Machine-Recursive Feature Elimination (B-SVM-RFE) method for the classification of plant pre-miRNAs. Using this method, 63 features were obtained for plant miRNA classification. We then developed an integrated classification model, miPlantPreMat, which comprises MiPlantPre and MiPlantMat, to identify plant pre-miRNAs and their miRNAs. This model achieved approximately 90% accuracy using plant datasets from nine plant species, including Arabidopsis thaliana, Glycine max, Oryza sativa, Physcomitrella patens, Medicago truncatula, Sorghum bicolor, Arabidopsis lyrata, Zea mays and Solanum lycopersicum. Using miPlantPreMat, 522 Solanum lycopersicum miRNAs were identified in the Solanum lycopersicum genome sequence.ConclusionsWe developed an integrated classification model, miPlantPreMat, based on structure-sequence features and SVM. MiPlantPreMat was used to identify both plant pre-miRNAs and the corresponding mature miRNAs. An improved feature selection method was proposed, resulting in high classification accuracy, sensitivity and specificity.Electronic supplementary materialThe online version of this article (doi:10.1186/s12859-014-0423-x) contains supplementary material, which is available to authorized users.

Highlights

  • MicroRNAs are a family of non-coding RNAs approximately 21 nucleotides in length that play pivotal roles at the post-transcriptional level in animals, plants and viruses

  • We focus on building a model that can be used in the classification of real/pseudo plant premiRNAs together with their mature miRNAs via the machine learning method

  • The results of feature subset selection To obtain the highest classification performance, three subset selection methods were used in this paper: Principal Components Analysis (PCA), Correlation-based Feature Subset Selection (CFS) [33] and B-support vector machine (SVM)-RFE

Read more

Summary

Introduction

MicroRNAs (miRNAs) are a family of non-coding RNAs approximately 21 nucleotides in length that play pivotal roles at the post-transcriptional level in animals, plants and viruses. It is essential to develop a prediction method based on machine learning to identify new plant miRNAs. MicroRNAs (miRNAs) are a family of non-coding RNAs approximately 21 nucleotides (nt) in length that play important roles at the post-transcriptional level in animals, plants and viruses [1]. MicroRNAs (miRNAs) are a family of non-coding RNAs approximately 21 nucleotides (nt) in length that play important roles at the post-transcriptional level in animals, plants and viruses [1] These molecules are first cut from a stem-loop structure by RNaseDicer III. Mature miRNAs are released from pre-miRNAs with hairpin structures by Dicer-like enzyme

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call