Significant improvement of miRNA target prediction accuracy in large datasets using meta-strategy based on comprehensive voting and artificial neural networks

Bi Zhao,Bin Xue

doi:10.1186/s12864-019-5528-1

Abstract

BackgroundIdentifying mRNA targets of miRNAs is critical for studying gene expression regulation at the whole-genome level. Multiple computational tools have been developed to predict miRNA:mRNA interactions. Nonetheless, many of these tools are developed in various small datasets, which each represent a limited sample space. Thus, the prediction accuracy of these tools has not been systematically validated at a larger scale. Accordingly, comparing the prediction accuracy of these tools and determining their applicability become challenging. In addition, the accuracy of these tools, especially in large datasets, needs to be improved for broader applications.ResultsIn this project, a large dataset containing more than 46,600 miRNA:mRNA interactions was assembled and split into eleven subsets based on the availability of prediction scores of four individual predictors, which are miRanda, miRDB, PITA, and TargetScan. In each of these subsets, the predictive results of four individual predictors were integrated using decision-tree based artificial neural networks to make the meta-prediction. The decision-tree is used here to sort the predictive results of four individual predictors, and artificial neural networks are applied to make meta-prediction based on the outputs of individual predictors. In the decision tree, dual-threshold and two-step significance-voting were incorporated, information gain was analysed to select threshold values. The prediction performance of this new strategy was improved significantly in most of the eleven datasets comparing to the individual predictors and other meta-predictors, such as ComiR, under multi-fold cross-validation, as well as in independent datasets. The overall improvement of prediction accuracy in independent datasets is at least 9 percentile points comparing to the other predictors, and the percentage of improvement of F1 and MCC scores is at least 40% compared to the other predictors.ConclusionsThe combination of dual-threshold, two-step significance-voting, and analysis of information gain is very effective in optimizing the outcome of decision-tree, and further integration with artificial neural networks is critical for further improving the performance of meta-predictor. A new pipeline based on this integration for miRNA target prediction has been developed. A strategy using outputs of individual predictors to reorganize large-scale miRNA:mRNA interaction dataset has also been validated and used to evaluate the prediction accuracy of predictors. The predictor is available at: https://github.com/xueLab/mirTarDANN).

Highlights

Identifying mRNA targets of miRNAs is critical for studying gene expression regulation at the wholegenome level
Performance of individual predictors in eleven predictorspecific datasets A large dataset containing more than 40,000 miRNA:mRNA interactions was assembled
Based on our previous studies where artificial neural network (ANN) or dualthreshold sequential-voting was used to integrate the predictive results of individual predictor, a novel infrastructure of meta-strategy that combines ANN and decision-tree was designed in this project to make meta-prediction of miRNA targets using predictive scores of four individual predictors, including miRanda, miRDB, PITA, and targetScan

Summary

Introduction

Identifying mRNA targets of miRNAs is critical for studying gene expression regulation at the wholegenome level. Multiple computational tools have been developed to predict miRNA:mRNA interactions. Many of these tools are developed in various small datasets, which each represent a limited sample space. Comparing the prediction accuracy of these tools and determining their applicability become challenging. Being very short compared to many other types of RNA molecules, miRNAs play critical roles in genome-wide gene expression regulation. Many mammal genomes each contain more than 2000 miRNAs [11] These miRNAs were estimated to regulate about 60% of all genes in each genome [12,13,14]

Methods

Results

Discussion

Conclusion