MiRmap: Comprehensive prediction of microRNA target repression strength

Charles E Vejnar,Evgeny M Zdobnov

doi:10.1093/nar/gks901

Abstract

MicroRNAs, or miRNAs, post-transcriptionally repress the expression of protein-coding genes. The human genome encodes over 1000 miRNA genes that collectively target the majority of messenger RNAs (mRNAs). Base pairing of the so-called miRNA ‘seed’ region with mRNAs identifies many thousands of putative targets. Evaluating the strength of the resulting mRNA repression remains challenging, but is essential for a biologically informative ranking of potential miRNA targets. To address these challenges, predictors may use thermodynamic, evolutionary, probabilistic or sequence-based features. We developed an open-source software library, miRmap, which for the first time comprehensively covers all four approaches using 11 predictor features, 3 of which are novel. This allowed us to examine feature correlations and to compare their predictive power in an unbiased way using high-throughput experimental data from immunopurification, transcriptomics, proteomics and polysome fractionation experiments. Overall, target site accessibility appears to be the most predictive feature. Our novel feature based on PhyloP, which evaluates the significance of negative selection, is the best performing predictor in the evolutionary category. We combined all the features into an integrated model that almost doubles the predictive power of TargetScan. miRmap is freely available from http://cegg.unige.ch/mirmap.

Highlights

MicroRNAs are short ($22 nt) non-coding RNAs that guide the RNA-induced silencing complex (RISC) to post-transcriptionally repress the expression of protein-coding genes by binding to targeted messengerRNAs (1–3)
Novel methods include (i) a more accurate way to compute the binding energy between the miRNA and the messenger RNAs (mRNAs) based on the ensemble free energy instead of the minimum free energy, (ii) an exact method to compute the probability that the seed match is an over-represented motif in the 30-UTR and (iii) a non-empirical statistical test to assess the significance of target site evolutionary conservation
We examined three simple functions to combine the individual scores of target sites into a global metric at the mRNA level: the best, the sum and the log of the sum of the exponentials

Summary

Introduction

MicroRNAs (miRNAs) are short ($22 nt) non-coding RNAs that guide the RNA-induced silencing complex (RISC) to post-transcriptionally repress the expression of protein-coding genes by binding to targeted messengerRNAs (mRNAs) (1–3). The detailed mechanism of this guidance is not yet resolved, but exact pairing between the so-called ‘seed’ region, positions from 2 to 7 (or 8) from the 50-end of the miRNA, and the 30-UTR of the mRNA is believed to be necessary for most animal miRNA–mRNA interactions (4). Such miRNA seed pairing with a 30-UTR of an mRNA, is not always sufficient for a functional interaction (4), and in a few specific cases, non-canonical pairing (non-Watson– Crick pairing) with G:U wobbles or mismatches may be acceptable (4,5). Prioritization of targets for any miRNA functional analysis is of critical importance This necessitates the ranking of potential miRNA targets bearing a seed, predicting in a binary manner if an mRNA is a target or not. We used a collection features to computationally predict the miRNA repression strength from additional information beyond the seed match, and thereby rank putative miRNA–mRNA interactions in a biologically relevant manner

Methods

Results

Conclusion