Accelerate Literature Icon
Want to do a literature review? Try our new Literature Review workflow

Fusing gene expressions and transitive protein-protein interactions for inference of gene regulatory networks

  • Abstract
  • Highlights & Summary
  • PDF
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon

BackgroundSystematic fusion of multiple data sources for Gene Regulatory Networks (GRN) inference remains a key challenge in systems biology. We incorporate information from protein-protein interaction networks (PPIN) into the process of GRN inference from gene expression (GE) data. However, existing PPIN remain sparse and transitive protein interactions can help predict missing protein interactions. We therefore propose a systematic probabilistic framework on fusing GE data and transitive protein interaction data to coherently build GRN.ResultsWe use a Gaussian Mixture Model (GMM) to soft-cluster GE data, allowing overlapping cluster memberships. Next, a heuristic method is proposed to extend sparse PPIN by incorporating transitive linkages. We then propose a novel way to score extended protein interactions by combining topological properties of PPIN and correlations of GE. Following this, GE data and extended PPIN are fused using a Gaussian Hidden Markov Model (GHMM) in order to identify gene regulatory pathways and refine interaction scores that are then used to constrain the GRN structure. We employ a Bayesian Gaussian Mixture (BGM) model to refine the GRN derived from GE data by using the structural priors derived from GHMM. Experiments on real yeast regulatory networks demonstrate both the feasibility of the extended PPIN in predicting transitive protein interactions and its effectiveness on improving the coverage and accuracy the proposed method of fusing PPIN and GE to build GRN.ConclusionThe GE and PPIN fusion model outperforms both the state-of-the-art single data source models (CLR, GENIE3, TIGRESS) as well as existing fusion models under various constraints.

Similar Papers
  • Research Article
  • Cite Count Icon 6
  • 10.1016/j.isci.2023.105927
A data-driven optimization method for coarse-graining gene regulatory networks
  • Jan 4, 2023
  • iScience
  • Cristian Caranica + 1 more

A data-driven optimization method for coarse-graining gene regulatory networks

  • Research Article
  • Cite Count Icon 2
  • 10.1158/1538-7445.tim2013-a81
Abstract A81: Global gene regulatory and protein interaction networks in breast cancer metastasis
  • Feb 1, 2013
  • Cancer Research
  • Bin Zhang + 2 more

Purpose: Invasion-metastasis cascades that underlie macroscopic metastases remain poorly understood. Tackling such a complexity demands systems biology approaches by means of filtering and integrating myriad data. To gain a fine resolution of mechanisms underlying the process of metastasis, here we reconstruct global gene regulatory and protein interaction networks that potentially drive breast cancer metastasis. Procedures: A systematic literature review and a comprehensive data mining of well-curated cancer gene datasets were carried out to uncover a core metastasis gene set (CMGS) using an in house Boolean logic framework. Meanwhile, a gene regulatory network (GRN) was built up to distinguish between key cancer driver genes and passenger genes through integration of multiple large-scale genomic studies of breast cancer. Furthermore, CMGS was projected onto the latest HRPD protein interaction network (PIN) and the GRN to derive metastasis-specific protein interaction network (mPIN) and gene regulatory network (mGRN), respectively. To gain mechanistic insights of mPIN and mGRN, a key network driver analysis was employed to identify critical hub nodes of the networks as key regulators. Results: We demonstrate that CMGS is of significantly higher connectivity in both PIN and GRN than those not in CMGS, suggesting its critical controlling power to the system. As expected, both mPIN and mGRN are most enriched for angiogenesis, integrin signaling, and p53 pathways. Moreover, these metastasis specific networks are also enriched for well-known metastasis-related pathways such as Wnt, TGF-beta, and FGF pathways. Surprisingly, inflammatory pathways, such as chemokine signaling and B-cell activation pathways are also over-represented in the networks. The top hub genes of mPIN include AR, ABL1, ESR1, AKT1, SMAD4, TP53, CSNK2A1, MAPK1, PIK3R1 and SMAD3. This hub gene set is distinct from the top key regulators of mGRN including ACTA2, ACTL6A, ADM, AEBP1, ARF1, ARPC4, ATM, AURKA and BIRC5. Notably, all the top ten drivers of mGRN except ATM and AURKA are not in CMGS, suggesting the novel targets of metastasis uncovered through our integrative network analysis. Indeed, ARF1, encoding the GTPase ADP-ribosylation factor 1, has been shown to play an important role in both cancer cell proliferation and migration. Moreover, the subnetwork regulated by ARF1 clearly links to inflammatory key players, especially via microvesicle related biological process. Importantly, both driver gene sets are tightly connected by TP53 and MAPK8 physically and genetically, suggesting the strong connection between p53 pathway and inflammatory response as MAPK8 activation is pivotal for chemokine mediated inflammation processes especially for TNF-alpha. Conclusion: Taken together, our global metastasis-specific protein interaction and gene regulation networks as well as their key regulators shed light on a potential trajectory of invasion-metastasis cascades that enable the progress of breast cancer metastasis, underscoring the instrumental role of a novel signaling network involving p53, microvesicle associated ARF1, and TNF-alpha mediated inflammatory response pathways. Citation Format: Bin Zhang, Yongzhong Zhao, Jun Zhu. Global gene regulatory and protein interaction networks in breast cancer metastasis. [abstract]. In: Proceedings of the AACR Special Conference on Tumor Invasion and Metastasis; Jan 20-23, 2013; San Diego, CA. Philadelphia (PA): AACR; Cancer Res 2013;73(3 Suppl):Abstract nr A81.

  • Book Chapter
  • Cite Count Icon 6
  • 10.1007/978-3-540-88436-1_19
Fusion of Gene Regulatory and Protein Interaction Networks Using Skip-Chain Models
  • Jan 1, 2008
  • Iti Chaturvedi + 1 more

Inference of Gene Regulatory Networks (GRN) is important in understanding signal transduction pathways. This involves predicting the correct sequence of interactions and identifying all interacting genes. Using only gene expression data is insufficient, so additional sources of data like protein-protein interaction network (PPIN) are required. In this paper, we model time delayed interactions using a skip-chain model which finds missing edges between non-consecutive time points based on PPIN. Highest Viterbi approximation is used to select skip-edges. The k-skip validation model checks for kmissing genes between a predicted interaction, giving us advantages of validation as well as expansion of GRN. The method is demonstrated on a cell-division cycle data of S.cerevisiae(yeast). Comparison of the present method, with a previous approach of modeling PPIN by using a Gibbs prior are given.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 15
  • 10.3389/fbinf.2021.777299
Intracellular and Intercellular Gene Regulatory Network Inference From Time-Course Individual RNA-Seq.
  • Nov 11, 2021
  • Frontiers in bioinformatics
  • Makoto Kashima + 4 more

Gene regulatory network (GRN) inference is an effective approach to understand the molecular mechanisms underlying biological events. Generally, GRN inference mainly targets intracellular regulatory relationships such as transcription factors and their associated targets. In multicellular organisms, there are both intracellular and intercellular regulatory mechanisms. Thus, we hypothesize that GRNs inferred from time-course individual (whole embryo) RNA-Seq during development can reveal intercellular regulatory relationships (signaling pathways) underlying the development. Here, we conducted time-course bulk RNA-Seq of individual mouse embryos during early development, followed by pseudo-time analysis and GRN inference. The results demonstrated that GRN inference from RNA-Seq with pseudo-time can be applied for individual bulk RNA-Seq similar to scRNA-Seq. Validation using an experimental-source-based database showed that our approach could significantly infer GRN for all transcription factors in the database. Furthermore, the inferred ligand-related and receptor-related downstream genes were significantly overlapped. Thus, the inferred GRN based on whole organism could include intercellular regulatory relationships, which cannot be inferred from scRNA-Seq based only on gene expression data. Overall, inferring GRN from time-course bulk RNA-Seq is an effective approach to understand the regulatory relationships underlying biological events in multicellular organisms.

  • Supplementary Content
  • 10.4225/03/59bf0bad8745a
Incorporating and Generating Prior Knowledge to Improve Gene Regulatory Network Inference
  • Sep 17, 2017
  • Figshare
  • Ajay Nair

Cells regulate the gene expression and protein activity to grow and adapt to the external environment. Identifying the regulatory interactions in a cell is critical to understand and engineer the life process. Gene regulatory network (GRN) inference is the process of reconstructing the network of regulatory interactions from experimental data by using statistical or machine-learning techniques. GRN inference remains an unsolved grand challenge. Incorporating prior knowledge into GRN inference is a promising approach proposed in literature for accurate GRN reconstruction. There are limitations in the reported methods of incorporating prior knowledge (termed priors). Firstly, the current methods focus on the knowledge of the presence of interactions between genes (edge priors). Secondly, only a few methods are known to incorporate priors, which incorporate it `before' the inference. Thus, many high-performing methods are not known to incorporate priors. Thirdly, priors exist only for a few well-studied organisms. The thesis demonstrated that the edge priors provide only a limited improvement in the accuracy of GRN inference. It proposed and demonstrated that prior knowledge of the absence of interactions between genes (non-edge priors) is significant in improving the overall accuracy. The specificity, precision, and F1-score improved by 2-10%, 5-40%, and 5-12%, respectively. A method to generate around 70% of non-edge priors was also demonstrated. This thesis analysed the maxP technique, which is widely used to reduce computational time, and identified its limitations. Two algorithms that overcome the limitations but retain the strengths of maxP, by incorporating GRN topology priors 'during' the inference, were proposed and developed. The theoretical and experimental results showed that these algorithms take only one-third of the normal computational time, without sacrificing the accuracy. The thesis proposed and developed two algorithms that integrate priors 'after' the GRN inference process. Further, a method to identify and remove wrong interactions by using priors was proposed and developed. The results showed that the accuracy improved and errors reduced; around 970 additional correct edges were obtained and 1300 wrong interactions were removed with the incorporation of half of the total priors, when compared to a normal GRN inference. Moreover, the limitation that only a few GRN inference methods can incorporate the priors is overcome. A generic mapping pipeline for predicting regulatory interactions with confidence ranks in an organism by using the known regulatory interactions from another organism was developed. This mapping pipeline was used to predict 20,280 regulatory interactions in 30 strains of cyanobacteria, which are a less-studied but scientifically and industrially relevant. A database, the RegCyanoDB, for these regulatory interactions is developed and made available for public access. Thus, this thesis has focused on developing efficient methods for incorporating priors into GRN inference and generating priors for less-studied organisms. The thesis demonstrated that non-edge priors are significant in priors 'before' inference methods. Further, priors 'during' and 'after' inference methods were proposed and developed. A bioinformatic pipeline to predict regulatory interactions in less-studied organisms was also developed and applied.

  • Book Chapter
  • Cite Count Icon 1
  • 10.4018/978-1-5225-8903-7.ch010
Inference of Gene Regulatory Networks by Topological Prior Information and Data Integration
  • Jan 1, 2019
  • Biotechnology
  • David Correa Martins Jr + 2 more

The inference of Gene Regulatory Networks (GRNs) is a very challenging problem which has attracted increasing attention since the development of high-throughput sequencing and gene expression measurement technologies. Many models and algorithms have been developed to identify GRNs using mainly gene expression profile as data source. As the gene expression data usually has limited number of samples and inherent noise, the integration of gene expression with several other sources of information can be vital for accurately inferring GRNs. For instance, some prior information about the overall topological structure of the GRN can guide inference techniques toward better results. In addition to gene expression data, recently biological information from heterogeneous data sources have been integrated by GRN inference methods as well. The objective of this chapter is to present an overview of GRN inference models and techniques with focus on incorporation of prior information such as, global and local topological features and integration of several heterogeneous data sources.

  • Book Chapter
  • Cite Count Icon 2
  • 10.4018/978-1-5225-0353-8.ch001
Inference of Gene Regulatory Networks by Topological Prior Information and Data Integration
  • Jan 1, 2016
  • David Correa Martins Jr + 2 more

The inference of Gene Regulatory Networks (GRNs) is a very challenging problem which has attracted increasing attention since the development of high-throughput sequencing and gene expression measurement technologies. Many models and algorithms have been developed to identify GRNs using mainly gene expression profile as data source. As the gene expression data usually has limited number of samples and inherent noise, the integration of gene expression with several other sources of information can be vital for accurately inferring GRNs. For instance, some prior information about the overall topological structure of the GRN can guide inference techniques toward better results. In addition to gene expression data, recently biological information from heterogeneous data sources have been integrated by GRN inference methods as well. The objective of this chapter is to present an overview of GRN inference models and techniques with focus on incorporation of prior information such as, global and local topological features and integration of several heterogeneous data sources.

  • Research Article
  • Cite Count Icon 2
  • 10.1166/asl.2018.12979
An Improved Integrative Random Forest for Gene Regulatory Network Inference for Breast Cancer
  • Oct 1, 2018
  • Advanced Science Letters
  • Suntharaamurthi Chandran + 5 more

Gene Regulatory Network (GRN) inference aims to capture the regulatory influences between the genes and regulatory events in the GRN. Integrative Random Forest for Gene Regulatory Network Inference (iRafNet) is a RF based algorithm which provides a great result in constructing GRN inference by integrating multiple data types. Most of the approaches did justify their duty but there are some limitations which don’t allow it to reach its optimal state and needs a very long computational time to construct a GRN inference. Other than that, they do not provide optimal performance. There are redundant genes in the dataset. GRN inference by existing methods has a lower accuracy on benchmark and real dataset. Furthermore, the computational time to produce the GRN inference is very long in the existing methods. To overcome these issues is proposed improved the existing method by adding a gene selection into it. To perform the improvement the existing methods was studied and analyzed on their performance in constructing GRN inference. Improved iRafNet was designed and developed to reduce the computational time to construct the GRN inference gene from the dataset. Finally, the accuracy and computational time of the proposed method was validated and verified with the benchmark and real dataset. Improved iRafNet has proven its performance by having a higher AUC and lower computational time.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 10
  • 10.1186/s12864-017-4131-6
An uncertain model-based approach for identifying dynamic protein complexes in uncertain protein-protein interaction networks
  • Oct 1, 2017
  • BMC Genomics
  • Yijia Zhang + 4 more

BackgroundRecently, researchers have tried to integrate various dynamic information with static protein-protein interaction (PPI) networks to construct dynamic PPI networks. The shift from static PPI networks to dynamic PPI networks is essential to reveal the cellular function and organization. However, it is still impossible to construct an absolutely reliable dynamic PPI networks due to the noise and incompletion of high-throughput experimental data.ResultsTo deal with uncertain data, some uncertain graph models and theories have been proposed to analyze social networks, electrical networks and biological networks. In this paper, we construct the dynamic uncertain PPI networks to integrate the dynamic information of gene expression and the topology information of high-throughput PPI data. The dynamic uncertain PPI networks can not only provide the dynamic properties of PPI, which are neglected by static PPI networks, but also distinguish the reliability of each protein and PPI by the existence probability. Then, we use the uncertain model to identify dynamic protein complexes in the dynamic uncertain PPI networks.ConclusionWe use gene expression data and different high-throughput PPI data to construct three dynamic uncertain PPI networks. Our approach can achieve the state-of-the-art performance in all three dynamic uncertain PPI networks. The experimental results show that our approach can effectively deal with the uncertain data in dynamic uncertain PPI networks, and improve the performance for protein complex identification.

  • Research Article
  • Cite Count Icon 1
  • 10.1016/j.brainres.2024.149276
In-depth investigation of the complex pathophysiological mechanisms between diabetes and ischemic stroke through gene expression and regulatory network analysis
  • Oct 22, 2024
  • Brain Research
  • Ling Lin + 13 more

In-depth investigation of the complex pathophysiological mechanisms between diabetes and ischemic stroke through gene expression and regulatory network analysis

  • Research Article
  • Cite Count Icon 4
  • 10.1089/cmb.2016.0199
PEAK: Integrating Curated and Noisy Prior Knowledge in Gene Regulatory Network Inference.
  • Mar 15, 2017
  • Journal of Computational Biology
  • Doaa Altarawy + 2 more

With abundance of biological data, computational prediction of gene regulatory networks (GRNs) from gene expression data has become more feasible. Although incorporating other prior knowledge (PK), along with gene expression data, greatly improves prediction accuracy, the overall accuracy is still low. PK in GRN inference can be categorized into noisy and curated. In noisy PK, relations between genes do not necessarily correspond to regulatory relations and are thus considered inaccurate by inference algorithms such as transcription factor binding and protein-protein interactions. In contrast, curated PK is experimentally verified regulatory interactions in pathway databases. An issue in real data is that gene expression can poorly support the curated PK and thus most existing prediction algorithms cannot use these curated PK. Although several algorithms were proposed to incorporate noisy PK, none address curated PK with poor gene expression support. We present PEAK, a system to integrate both curated and noisy PK in GRN inference, especially with poor gene expression support. We introduce a novel method for GRN inference, CurInf, to effectively integrate curated PK, even when the gene expression data poorly support the PK. PEAK also uses the previously proposed method Modified Elastic Net to incorporate noisy PK, and we call it NoisInf. In our experiment, CurInf significantly incorporates curated PK, which was regarded as noise by previous methods. Using 100% curated PK, CurInf improves the area under precision-recall curve accuracy score over NoisInf by 27.3% in synthetic data, 86.5% in Escherichia coli data, and 31.1% in Saccharomyces cerevisiae data. Moreover, even when the noise in PK is 10 times more than true PK, PEAK performs better than inference without any PK. Better integration of curated PK helps biologists benefit from verified experimental data to predict more reliable GRN.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 18
  • 10.1109/access.2019.2936794
IMTF-GRN: Integrative Matrix Tri-Factorization for Inference of Gene Regulatory Networks
  • Jan 1, 2019
  • IEEE Access
  • Nisar Wani + 1 more

Gene Regulatory Network (GRN) inference using computational approaches has been a highly pursued problem in bioinformatics. Various approaches have been developed to infer GRNs from gene expression data including statistical, machine learning and information theoretic approaches. However, a large number of regulatory relationships remain unpredicted even in the highly studied model organisms such as Escherichia coli and Saccharomyces cerevisiae . Besides, the regulatory relationships in higher eukaryotes with large genome sizes, such as humans and mice remain mostly unexplored. Majority of the approaches proposed in the literature on GRNs infer molecular interactions from gene expression data alone, despite the fact that gene expression regulation being a product of sequential interactions of multiple biological processes. To capture more regulatory relationships with higher precision, we apply a data fusion and inference model based on Non-negative Matrix Tri-factorization called integrative matrix tri-factorization for GRN inference (iMTF-GRN) that can integrate the diverse type of biological data in a relational learning framework. We, demonstrate that iMTF-GRN model shows improved accuracy in predicting TF-target and miRNA-target gene regulations and performs comparatively better over other state-of-the-art methods.

  • Research Article
  • 10.1093/bioinformatics/btaf120
Topology-based metrics for finding the optimal sparsity in gene regulatory network inference.
  • Mar 24, 2025
  • Bioinformatics (Oxford, England)
  • Nils Lundqvist + 3 more

Gene regulatory network (GRN) inference is a complex task aiming to unravel regulatory interactions between genes in a cell. A major shortcoming of most GRN inference methods is that they do not attempt to find the optimal sparsity, i.e. the single best GRN, which is important when applying GRN inference in a real situation. Instead, the sparsity tends to be controlled by an arbitrarily set hyperparameter. In this paper, two new methods for predicting the optimal sparsity of GRNs are formulated and benchmarked on simulated perturbation-based gene expression data using four GRN inference methods: LASSO, Zscore, LSCON, and GENIE3. Both sparsity prediction methods are defined using the hypothesis that the topology of real GRNs is scale-free, and are evaluated based on their ability to predict the sparsity of the true GRN. The results show that the new topology-based approaches reliably predict a sparsity close to the true one. This ability is valuable for real-world applications where a single GRN is inferred from real data. In such situations, it is vital to be able to infer a GRN with the correct sparsity. https://bitbucket.org/sonnhammergrni/powerlaw_sparsity/ and https://codeocean.com/capsule/4393635/.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 6
  • 10.1101/2023.08.01.551575
Continuous lifelong learning for modeling of gene regulation from single cell multiome data by leveraging atlas-scale external data
  • Aug 3, 2023
  • bioRxiv
  • Qiuyue Yuan + 1 more

Accurate context-specific Gene Regulatory Networks (GRNs) inference from genomics data is a crucial task in computational biology. However, existing methods face limitations, such as reliance on gene expression data alone, lower resolution from bulk data, and data scarcity for specific cellular systems. Despite recent technological advancements, including single-cell sequencing and the integration of ATAC-seq and RNA-seq data, learning such complex mechanisms from limited independent data points still presents a daunting challenge, impeding GRN inference accuracy. To overcome this challenge, we present LINGER (LIfelong neural Network for GEne Regulation), a novel deep learning-based method to infer GRNs from single-cell multiome data with paired gene expression and chromatin accessibility data from the same cell. LINGER incorporates both 1) atlas-scale external bulk data across diverse cellular contexts and 2) the knowledge of transcription factor (TF) motif matching to cis-regulatory elements as a manifold regularization to address the challenge of limited data and extensive parameter space in GRN inference. Our results demonstrate that LINGER achieves 2–3 fold higher accuracy over existing methods. LINGER reveals a complex regulatory landscape of genome-wide association studies, enabling enhanced interpretation of disease-associated variants and genes. Additionally, following the GRN inference from a reference sc-multiome data, LINGER allows for the estimation of TF activity solely from bulk or single-cell gene expression data, leveraging the abundance of available gene expression data to identify driver regulators from case-control studies. Overall, LINGER provides a comprehensive tool for robust gene regulation inference from genomics data, empowering deeper insights into cellular mechanisms.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 3
  • 10.1371/journal.pone.0206976
Ambiguity in logic-based models of gene regulatory networks: An integrative multi-perturbation analysis
  • Nov 20, 2018
  • PLoS ONE
  • Amir Reza Alizad-Rahvar + 1 more

Most studies of gene regulatory network (GRN) inference have focused extensively on identifying the interaction map of the GRNs. However, in order to predict the cellular behavior, modeling the GRN in terms of logic circuits, i.e., Boolean networks, is necessary. The perturbation techniques, e.g., knock-down and over-expression, should be utilized for identifying the underlying logic behind the interactions. However, we will show that by using only transcriptomic data obtained by single-perturbation experiments, we cannot observe all regulatory interactions, and this invisibility causes ambiguity in our model. Consequently, we need to employ the data of multiple omics layers (genome, transcriptome, and proteome) as well as multiple perturbation experiments to reduce or eliminate ambiguity in our modeling. In this paper, we introduce a multi-step perturbation experiment to deal with ambiguity. Moreover, we perform a thorough analysis to investigate which types of perturbations and omics layers play the most important role in the unambiguous modeling of the GRNs and how much ambiguity will be eliminated by considering more perturbations and more omics layers. Our analysis shows that performing both knock-down and over-expression is necessary in order to achieve the least ambiguous model. Moreover, the more steps of the perturbation are taken, the more ambiguity is eliminated. In addition, we can even achieve an unambiguous model of the GRN by using multi-step perturbation and integrating transcriptomic, protein-protein interaction, and cis-element data. Finally, we demonstrate the effect of utilizing different types of perturbation experiment and integrating multi-omics data on identifying the logic behind the regulatory interactions in a synthetic GRN. In conclusion, relying on the results of only knock-down experiments and not including as many omics layers as possible in the GRN inference, makes the results ambiguous, unreliable, and less accurate.

Save Icon
Up Arrow
Open/Close
Setting-up Chat
Loading Interface