Abstract

MicroRNAs (miRNAs) are a class of short, non-coding RNA that play regulatory roles in a wide variety of biological processes, such as plant growth and abiotic stress responses. Although several computational tools have been developed to identify primary miRNAs and precursor miRNAs (pre-miRNAs), very few provide the functionality of locating mature miRNAs within plant pre-miRNAs. This manuscript introduces a novel algorithm for predicting miRNAs named miRLocator, which isbased on machine learning techniques and sequence and structural features extracted from miRNA:miRNA* duplexes. To address the class imbalance problem (few real miRNAs and a large number of pseudo miRNAs), the prediction models in miRLocator were optimized by considering critical (and often ignored) factors that can markedly affect the prediction accuracy of mature miRNAs, including the machine learning algorithm and the ratio between training positive and negative samples. Ten-fold cross-validation on 5854 experimentally validated miRNAs from 19 plant species showed that miRLocator performed better than the state-of-art miRNA predictor miRdup in locating mature miRNAs within plant pre-miRNAs. miRLocator will aid researchers interested in discovering miRNAs from model and non-model plant species.

Highlights

  • MicroRNAs are small (~22 nucleotides), non-coding RNA molecules with important regulatory roles in gene expression

  • We generated a positive sample set consisting of 4505 miRNA duplexes supported with experimentally validated miRNAs and a negative sample set consisting of approximately 4505ÃN (N = 1, 5, 10 and 50) pseudo miRNA duplexes derived from randomly selected segments within the pre-miRNA hairpins

  • We encoded each sample with 440 numeric features (Table 1 and S2 Table) and performed ten-fold cross-validation and ROC analyses to evaluate the performance of the miRNA predictors constructed with five machine learning (ML) algorithms(RF, support vector machine (SVM), NB, k-nearest neighbors (kNN) and decision tree (DT))

Read more

Summary

Introduction

MicroRNAs (miRNAs) are small (~22 nucleotides), non-coding RNA molecules with important regulatory roles in gene expression. MiRNAs target a large number of protein-coding genes and are involved in various biological processes, including plant development, growth, abiotic stress responses and pathogen responses[2,3,4].The genome-wide identification of miRNAs is critical for obtaining a better understanding of the complex post-transcriptional regulation mechanisms involved in these biological processes. Machine Learning-Based Mature miRNA Prediction technologies have the ability to discover miRNAs in a high-throughput manner. This type of experimental method remains time-consuming because it requires the identification of expressed miRNAs from millions of sequencing reads and has a limited ability to detect miRNAs that exhibit low, linkage,stress, developmental and/or cell-specific expression[5]. Experimental technologies must be complemented with computational approaches to identify miRNAs at the genome scale, regardless of the availability of NGS sequencing data [5,6]

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.