Abstract

Abstract Background: MicroRNAs are small noncoding RNAs that mediate gene regulation at the post-transcriptional level via multiple mechanisms such as mRNA degradation, translational inhibition, and mRNA stabilization. They are involved in several cellular processes from development to homeostasis, and their deregulation is implicated in several diseases, including cancer. Since miRNA lacks the polyA tail, the standard single cell RNAseq protocols do not capture miRNAs, thus severely limiting our understanding of miRNA functions at cellular resolution. To overcome this limitation, we develop a novel machine learning method to infer the miRNA activity in a sample given its RNAseq profile. Methods: We develop a model using XGBoost, to predict miRNA profile in a sample from its global mRNA profile. We train and test the model using cross validation in the CCLE collection, as well as a number of healthy and cancer human tissue data obtained from GTEx and TCGA. We quantify the method's performance as the correlation between actual and predicted miRNA expression values across the test samples. We validate our model in multiple single cell datasets where miRNA and mRNA profiles are available for the same cell types by assessing the model's ability to identify cell type specific miRNAs based on a model trained on independent bulk datasets. Results: First, we show in CCLE collection, and multiple TCGA tissues, that a model based on all genes was far more accurate than the model based only on known targets. Our mean cross validation model accuracy across 10 tissues having greater than 100 paired miRNA and mRNA samples (in terms of Spearman correlation between predicted and actual expression of a miRNA) is 0.45 (min 0.39 in Pancreas to a max of 0.51 in Brain). In comparison to the normal tissues in GTEx, in the malignant counterpart in TCGA, due to greater heterogeneity, therefore greater variability in gene expression, our model performs significantly better (average cross validation accuracy improvement of 0.19). We have validated our model in independent single cell data. Using a model trained in bulk tissue data, we predict microRNA expression levels in a single cell based on the single cell RNA and compare our predicted fold difference by a miRNA's expression between two cell types with the actual fold difference. We quantify the prediction accuracy as the correlation between the predicted and actual fold differences across all miRNAs. In a total of 4 cell type pair comparisons (different sets of kidney, brain, breast, and skin), our model achieves an average accuracy of 0.81 (ranging from 0.73 to 0.89), thus strongly validating our model. Our next step is to apply our model to study miRNA activities during T cell development, Pancreatic Ductal Adenocarcinoma, and Glioblastoma, in collaboration with experimentalists. Conclusions: Our method addresses a major bottleneck in studying miRNA activities at a cellular resolution and can be applied to any scRNA data to infer miRNA activity. Citation Format: Gulden Olgun, Vishaka Gopalan, Sridhar Hannenhalli. Quantifying miRNA activity in single cell clusters [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2021; 2021 Apr 10-15 and May 17-21. Philadelphia (PA): AACR; Cancer Res 2021;81(13_Suppl):Abstract nr 198.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call