Uncovering cancer gene regulation by accurate regulatory network inference from uninformative data

Deniz Seçilmiş,Erik L L Sonnhammer,Daniel Morgan,Andreas Tjärnberg,Thomas Hillerton,Torbjörn E M Nordling,Sven Nelander

doi:10.1038/s41540-020-00154-6

Abstract

The interactions among the components of a living cell that constitute the gene regulatory network (GRN) can be inferred from perturbation-based gene expression data. Such networks are useful for providing mechanistic insights of a biological system. In order to explore the feasibility and quality of GRN inference at a large scale, we used the L1000 data where ~1000 genes have been perturbed and their expression levels have been quantified in 9 cancer cell lines. We found that these datasets have a very low signal-to-noise ratio (SNR) level causing them to be too uninformative to infer accurate GRNs. We developed a gene reduction pipeline in which we eliminate uninformative genes from the system using a selection criterion based on SNR, until reaching an informative subset. The results show that our pipeline can identify an informative subset in an overall uninformative dataset, allowing inference of accurate subset GRNs. The accurate GRNs were functionally characterized and potential novel cancer-related regulatory interactions were identified.

Highlights

Living organisms are orchestrated by the biochemical reactions that occur as a result of the interactions between biomolecules
These processes in a living organism on a gene level can be represented via a gene regulatory network (GRN) that can be inferred from perturbation-based gene expression data, e.g., where each gene in the system is knocked down in a separate experiment
Previous studies have shown that accurate GRN inference relies mainly on the signal-to-noise ratio (SNR) of the datasets[7,8,9,10]

Summary

Introduction

Living organisms are orchestrated by the biochemical reactions that occur as a result of the interactions between biomolecules. For that reason, understanding the biochemical, physiological, and pathological processes from a gene regulation perspective is of importance These processes in a living organism on a gene level can be represented via a gene regulatory network (GRN) that can be inferred from perturbation-based gene expression data, e.g., where each gene in the system is knocked down in a separate experiment. A nested bootstrapping algorithm has been proposed to control the false discovery rate (FDR) in GRN inference of noisy datasets, in order to improve the accuracy of the applied method[8] These solutions are applied directly to inferred GRNs of noisy datasets, and if the dataset as a whole is not informative the improvement may be limited

Methods

Results

Conclusion