Abstract

The interactions among the components of a living cell that constitute the gene regulatory network (GRN) can be inferred from perturbation-based gene expression data. Such networks are useful for providing mechanistic insights of a biological system. In order to explore the feasibility and quality of GRN inference at a large scale, we used the L1000 data where ~1000 genes have been perturbed and their expression levels have been quantified in 9 cancer cell lines. We found that these datasets have a very low signal-to-noise ratio (SNR) level causing them to be too uninformative to infer accurate GRNs. We developed a gene reduction pipeline in which we eliminate uninformative genes from the system using a selection criterion based on SNR, until reaching an informative subset. The results show that our pipeline can identify an informative subset in an overall uninformative dataset, allowing inference of accurate subset GRNs. The accurate GRNs were functionally characterized and potential novel cancer-related regulatory interactions were identified.

Highlights

  • Living organisms are orchestrated by the biochemical reactions that occur as a result of the interactions between biomolecules

  • These processes in a living organism on a gene level can be represented via a gene regulatory network (GRN) that can be inferred from perturbation-based gene expression data, e.g., where each gene in the system is knocked down in a separate experiment

  • Previous studies have shown that accurate GRN inference relies mainly on the signal-to-noise ratio (SNR) of the datasets[7,8,9,10]

Read more

Summary

Introduction

Living organisms are orchestrated by the biochemical reactions that occur as a result of the interactions between biomolecules. For that reason, understanding the biochemical, physiological, and pathological processes from a gene regulation perspective is of importance These processes in a living organism on a gene level can be represented via a gene regulatory network (GRN) that can be inferred from perturbation-based gene expression data, e.g., where each gene in the system is knocked down in a separate experiment. A nested bootstrapping algorithm has been proposed to control the false discovery rate (FDR) in GRN inference of noisy datasets, in order to improve the accuracy of the applied method[8] These solutions are applied directly to inferred GRNs of noisy datasets, and if the dataset as a whole is not informative the improvement may be limited

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call