Abstract

MicroRNA are 20–24 nt, non-coding, single stranded molecule regulating traits and stress response. Tissue and time specific expression limits its detection, thus is major challenge in their discovery. Wheat has limited 119 miRNAs in MiRBase due to limitation of conservation based methodology where old and new miRNA genes gets excluded. This is due to origin of hexaploid wheat by three successive hybridization, older AA, BB and younger DD subgenome. Species specific miRNA prediction (SMIRP concept) based on 152 thermodynamic features of training dataset using support vector machine learning approach has improved prediction accuracy to 97.7%. This has been implemented in TamiRPred (http://webtom.cabgrid.res.in/tamirpred). We also report highest number of putative miRNA genes (4464) of wheat from whole genome sequence populated in database developed in PHP and MySQL. TamiRPred has predicted 2092 (>45.10%) additional miRNA which was not predicted by miRLocator. Predicted miRNAs have been validated by miRBase, small RNA libraries, secondary structure, degradome dataset, star miRNA and binding sites in wheat coding region. This tool can accelerate miRNA polymorphism discovery to be used in wheat trait improvement. Since it predicts chromosome-wise miRNA genes with their respective physical location thus can be transferred using linked SSR markers. This prediction approach can be used as model even in other polyploid crops.

Highlights

  • MicroRNAs have been identified as important endogenous regulators to various traits and responses against stresses

  • Among the machine learning methodologies used in the study, viz., Artificial Neural Networks (ANNs), random forest (RF) and support vector machines (SVM), model developed using SVM-radial basis function (SVM-Radial Basis Function (RBF)) was found to have maximum accuracy of 97.7%

  • The various evaluation measures like sensitivity or true positive rate (TPR), specificity or true negative rate (TNR), precision or positive predictive value (PPV), negative predictive value (NPV), fall-out or false positive rate (FPR), false negative rate (FNR), false discovery rate (FDR), accuracy (ACC), F1 score, Matthew’s correlation coefficient (MCC), informedness and markedness discussed in the previous section were adopted to evaluate the models in this study

Read more

Summary

Introduction

MicroRNAs (miRNAs) have been identified as important endogenous regulators to various traits and responses against stresses Since they are single stranded, non-coding, 20–24 nucleotide small RNAs and major post-transcriptional regulators of gene expression, their identification and characterization is of great importance[1]. MiRNA polymorphism data can be used in association studies and associated miRNA can be transferred in breeding program using linked polymorphic SSR6 All these require whole genome based approach for miRNA discovery. Very recently it has been reported that during course of evolution, some oldest miRNAs gets “deleted” and some “younger” miRNAs being less conserved remain unpredicted[10] Since such events are specific to species in question, it would be more www.nature.com/scientificreports/

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call