Abstract Small-scale regulatory networks can model known biological processes; while large-scale genome-wide datasets can identify novel mechanisms. We developed a biophysical modeling framework that combines the accuracy of small-scale networks with the power of large-scale datasets. As a proof of principle, we implemented this framework on a cohort of 844 multiple myeloma (MM) patients’ (and 1092 TCGA breast cancer patients) z-normalized RNAseq data using t-Distributed Stochastic Neighbor Embedding to construct a disease-specific transcriptomic map, where genes closer to each other co-express within the cohort. Fuzzy c-means clustering is carried out to identify clusters of genes that are likely regulated by a common transcription factor (TF). We construct a gene regulatory network (GRN) for each cluster of co-expressing genes on a disease-specific transcriptomic map by identifying upstream TFs for each cluster using publicly available databases ENCODE and ChEA, and kinases that phosphorylate these TFs using PhosphoSitePlus and PhosphoPoint. An exhaustive list of TFs and kinases are reduced to a few key predictor variables using regression tree modeling for each gene in that cluster. This leads to a cascading network of kinases that phosphorylate TFs, which regulate expression of genes in a cluster. We derived a mechanistic model from first-principles to define functional relationships governing the GRN; where transcription, translation, and post-translational modifications are modeled using first-order reversible reaction kinetic equations. The patient-specific rate constants of the model are parametrized by single sample gene set enrichment analysis scores of key KEGG pathways like ribosome, protein synthesis, RNA degradation, etc. The system of differential equations, under steady-state, reduce to an algebraic equation that can predict the expression of every gene in a cluster from the expression of its upstream TFs and kinases alone, which is fitted to RNAseq data of 422 MM patients to estimate undetermined parameters. The remaining patients’ data is used to estimate the accuracy of the model using Pearson’s correlation (model predicted vs actual) coefficient, r. Out of 16,738 genes, 7,936 were predicted accurately (r>0.5), while the remaining genes were shown to have a significant overlap (hypergeometric test; p-value<1e-48 and representation factor = 7.27) with genes that have high variability in chromatin accessibility across patients. A reduced GRN with only accurately predicted genes is obtained for each cluster, followed by linking GRNs to each other through TFs and kinases that are featured in other GRNs; where betweenness centrality measures of the resulting directed graph identifies disease-specific master regulators. MYC, STAT3, CREB1, POLR2A, PLK1, and TP53 are found to be key hubs in MM network; similar analyses are being conducted for other cancers featured in TCGA. Citation Format: Praneeth Reddy Sudalagunta, Rafael Renatino Canevarolo, Mark Meads, Maria Coelho Silva, Xiaohong Zhao, Raghunandan Reddy Alugubelli, Joon-hyun Song, Erez Persi, Mehdi Damaghi, Kenneth H. Shain, Ariosto Silva. A novel gene regulatory network model identifies master regulators in cancer. [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2023; Part 1 (Regular and Invited Abstracts); 2023 Apr 14-19; Orlando, FL. Philadelphia (PA): AACR; Cancer Res 2023;83(7_Suppl):Abstract nr 4313.
Read full abstract