MiREM: an expectation-maximization approach for prioritizing miRNAs associated with gene-set

Luqman Hakim Abdul Hadi,Richie Soong,Marie Loh,Touati Benoukraf,Agus Salim,Quy Xiao Xuan Lin,Hong Kiat Ng,Tri Tran Minh

doi:10.1186/s12859-018-2292-1

Abstract

BackgroundThe knowledge of miRNAs regulating the expression of sets of mRNAs has led to novel insights into numerous and diverse cellular mechanisms. While a single miRNA may regulate many genes, one gene can be regulated by multiple miRNAs, presenting a complex relationship to model for accurate predictions.ResultsHere, we introduce miREM, a program that couples an expectation-maximization (EM) algorithm to the common approach of hypergeometric probability (HP), which improves the prediction and prioritization of miRNAs from gene-sets of interest. miREM has been made available through a web-server (https://bioinfo-csi.nus.edu.sg/mirem2/) that can be accessed through an intuitive graphical user interface. The program incorporates a large compendium of human/mouse miRNA-target prediction databases to enhance prediction. Users may upload their genes of interest in various formats as an input and select whether to consider non-conserved miRNAs, amongst filtering options. Results are reported in a rich graphical interface that allows users to: (i) prioritize predicted miRNAs through a scatterplot of HP p-values and EM scores; (ii) visualize the predicted miRNAs and corresponding genes through a heatmap; and (iii) identify and filter homologous or duplicated predictions by clustering them according to their seed sequences.ConclusionWe tested miREM using RNAseq datasets from two single “spiked” knock-in miRNA experiments and two double knock-out miRNA experiments. miREM predicted these manipulated miRNAs as having high EM scores from the gene set signatures (i.e. top predictions for single knock-in and double knock-out miRNA experiments). Finally, we have demonstrated that miREM predictions are either similar or better than results provided by existing programs.

Highlights

The knowledge of Micro RNA (miRNA) regulating the expression of sets of Messenger RNA (mRNA) has led to novel insights into numerous and diverse cellular mechanisms
In contrast to current methods based on hypergeometric probability (HP) only, we introduce a novel strategy in complement to HP, which (i) ’weigh-down’ the contribution from overlapping target genes when calculating the significance of each miRNAsignature using an expectation-maximization (EM) algorithm, a general probabilistic framework that can be used for this purpose [12]; and (ii) cluster all predicted miRNAs according to their seed region sequences for identifying “synonymous” predictions
We used a gene-set of repressed genes as input (Additional file 3: Table S2) and ran miREM, CORNA, GeneSet2MiRNA and ChemiRs (Table 2 and Additional file 4: Table S3; for Sylamer, whole gene list ranked by fold change was input). miREM has predicted involving miRNAs correctly, with hsa-miR-155-5p and hsa-miR-1-3p ranked at the first and third positions respectively

Summary

Results

We have developed miREM, an HP-EM-based program designed to predict miRNA activities from a gene list. miREM’s web server incorporates a large compendium of human/mouse miRNA-target prediction databases and provides rich output results facilitating prioritization and interpretation of predicted results. To test miREM performance, we benchmarked miREM predictions against CORNA [7], GeneSet2MiRNA [8], ChemiRs [9], and Sylamer [10] results using several datasets with known miRNA activities These are detailed in three case studies as follows: Case study 1: knock-in miRNA experiments We used two RNAseq expression datasets from miR-155 and miR-1 knock-in experiments in U2OS cells, respectively [25]. In these experiments, we used a gene-set of repressed genes as input (Additional file 3: Table S2) and ran miREM, CORNA, GeneSet2MiRNA and ChemiRs (Table 2 and Additional file 4: Table S3; for Sylamer, whole gene list ranked by fold change was input). We tested miREM’s performances using different HP p-value thresholds and EM convergence parameters given the downregulated gene list from hsa-miR-155 knock-in experiment. hsa-miR-155-5p remained the first-ranked candidate in various prediction settings (Additional file 6: Table S5)

Conclusion