Advances and applications of machine learning and intelligent optimization algorithms in genome-scale metabolic network models
Due to the increasing demand for microbially manufactured products in various industries, it has become important to find optimal designs for microbial cell factories by changing the direction of metabolic flow and its flux size by means of metabolic engineering such as knocking out competing pathways and introducing exogenous pathways to increase the yield of desired products. Recently, with the gradual cross-fertilization between computer science and bioinformatics fields, machine learning and intelligent optimization-based approaches have received much attention in Genome-scale metabolic network models (GSMMs) based on constrained optimization methods, and many high-quality related works have been published. Therefore, this paper focuses on the advances and applications of machine learning and intelligent optimization algorithms in metabolic engineering, with special emphasis on GSMMs. Specifically, the development history of GSMMs is first reviewed. Then, the analysis methods of GSMMs based on constraint optimization are presented. Next, this paper mainly reviews the development and application of machine learning and intelligent optimization algorithms in genome-scale metabolic models. In addition, the research gaps and future research potential in machine learning and intelligent optimization methods applied in GSMMs are discussed.
- # Genome-scale Metabolic Models
- # Genome-scale Metabolic Network Models
- # Intelligent Optimization Algorithms
- # Applications Of Machine Learning
- # Potential In Machine Learning
- # Metabolic Models
- # Constrained Optimization Methods
- # Machine Learning Algorithms
- # Intelligent Algorithms
- # Applications Of Learning
- Research Article
2
- 10.4155/pbp.13.37
- Oct 1, 2013
- Pharmaceutical Bioprocessing
Biotechnology is currently evolving through the era of big data, thanks to advances in the high-throughput technologies for rapid and inexpensive genome sequencing and other genome-wide studies [1]. With the daunting amount of data, it has been possible to put them together into a coherently organized biological network that provides counterintuitive insights on biological systems [2]. Among such biological networks, a genome-scale metabolic network model is expected to play an increasingly important role in the biopharmaceutical industry [3]. Before enumerating their specific strengths, it is important to note that principles underlying genome-scale metabolic network models are consistent with the holistic perspective of systems biology, the aim of which is to unveil hidden factors causing diseases and to find relevant treatment strategies [4]. Despite the importance of metabolism in a biological system, studies on diseases in relation to metabolism were far fewer in number than those performed on signaling and transcriptional regulatory networks [5]. However, metabolism, highly linked with observable phenotypes, is a biological network that is more comprehensively characterized when compared with the other two types of networks [6]. Metabolism is, therefore, amenable to large-scale mathematical modeling and simulation. It is with this motivation that the genome-scale metabolic simulation deserves more attention in drug discovery campaigns and optimization of a host strain for the production of biopharmaceuticals. Reconstruction and application of genome-scale metabolic network models have been forged as a major research strategy of systems biology. Over the last decade, genome-scale metabolic models have been built for almost all biologically important organisms across the domains of archea, bacteria and eukaryotes [3]. They range from simple micro organisms such as Escherichia coli [7] and Saccharomyces cerevisiae [8] to higher organisms including Chinese hamster ovary (CHO) cells [9,10] and a generic human cell [11,12]. It should be noted that all these organisms that have been subjected to metabolic modeling are important cellular hosts for biopharmaceutical production or medically meaningful organisms that need to be cured (e.g., specific cancer cells) or destroyed (e.g., pathogens). A recent notable development of importance in the genomescale metabolic modeling would be the newly updated human metabolic network Recon 2 [12]. Recon 2 is a result of efforts from a group of researchers, going over a vast amount of literature and biochemical data and reconciling conflicting information. Scope of the hitherto reconstructed genome-scale metabolic models manifest high expectations for their potential contributions to biopharmaceutical industry. Genome-scale metabolic network models are not just a simple pileup of biochemical reactions, but allow mathematical simulation under precisely defined conditions of constraints [13]. Once the experimentally Applications of genome-scale metabolic network models in the biopharmaceutical industry
- Research Article
2
- 10.1016/j.bej.2023.108947
- Apr 25, 2023
- Biochemical Engineering Journal
Reconstruction and analysis of a genome-scale metabolic model for the gut bacteria Prevotella copri
- Research Article
151
- 10.1186/1471-2105-7-512
- Nov 23, 2006
- BMC Bioinformatics
BackgroundThe availability of genome sequences for many organisms enabled the reconstruction of several genome-scale metabolic network models. Currently, significant efforts are put into the automated reconstruction of such models. For this, several computational tools have been developed that particularly assist in identifying and compiling the organism-specific lists of metabolic reactions. In contrast, the last step of the model reconstruction process, which is the definition of the thermodynamic constraints in terms of reaction directionalities, still needs to be done manually. No computational method exists that allows for an automated and systematic assignment of reaction directions in genome-scale models.ResultsWe present an algorithm that – based on thermodynamics, network topology and heuristic rules – automatically assigns reaction directions in metabolic models such that the reaction network is thermodynamically feasible with respect to the production of energy equivalents. It first exploits all available experimentally derived Gibbs energies of formation to identify irreversible reactions. As these thermodynamic data are not available for all metabolites, in a next step, further reaction directions are assigned on the basis of network topology considerations and thermodynamics-based heuristic rules. Briefly, the algorithm identifies reaction subsets from the metabolic network that are able to convert low-energy co-substrates into their high-energy counterparts and thus net produce energy. Our algorithm aims at disabling such thermodynamically infeasible cyclic operation of reaction subnetworks by assigning reaction directions based on a set of thermodynamics-derived heuristic rules. We demonstrate our algorithm on a genome-scale metabolic model of E. coli. The introduced systematic direction assignment yielded 130 irreversible reactions (out of 920 total reactions), which corresponds to about 70% of all irreversible reactions that are required to disable thermodynamically infeasible energy production.ConclusionAlthough not being fully comprehensive, our algorithm for systematic reaction direction assignment could define a significant number of irreversible reactions automatically with low computational effort. We envision that the presented algorithm is a valuable part of a computational framework that assists the automated reconstruction of genome-scale metabolic models.
- Research Article
33
- 10.1016/j.ymben.2021.06.005
- Jun 24, 2021
- Metabolic Engineering
Integrating thermodynamic and enzymatic constraints into genome-scale metabolic models
- Research Article
42
- 10.1007/s00253-022-12066-y
- Jul 13, 2022
- Applied Microbiology and Biotechnology
Over the last two decades, thousands of genome-scale metabolic network models (GSMMs) have been constructed. These GSMMs have been widely applied in various fields, ranging from network interaction analysis, to cell phenotype prediction. However, due to the lack of constraints, the prediction accuracy of first-generation GSMMs was limited. To overcome these limitations, the next-generation GSMMs were developed by integrating omics data, adding constrain condition, integrating different biological models, and constructing whole-cell models. Here, we review recent advances of GSMMs from the first generation to the next generation. Then, we discuss the major application of GSMMs in industrial biotechnology, such as predicting phenotypes and guiding metabolic engineering. In addition, human health applications, including understanding biological mechanisms, discovering biomarkers and drug targets, are also summarized. Finally, we address the challenges and propose new trend of GSMMs. KEY POINTS: •This mini-review updates the literature on almost all published GSMMs since 1999. •Detailed insights into the development of the first- and next-generation GSMMs. •The application of GSMMs is summarized, and the prospects of integrating machine learning are emphasized.
- Research Article
18
- 10.1038/s41598-020-64721-x
- May 8, 2020
- Scientific Reports
Zymomonas mobilis ZM4 has recently been used for a variety of biotechnological purposes. To rationally enhance its metabolic performance, a reliable genome-scale metabolic network model (GEM) of this organism is required. To this end, we reconstructed a genome-scale metabolic model (iHN446) for Z. mobilis, which involves 446 genes, 859 reactions, and 894 metabolites. We started by first reconciling the existing GEMs previously constructed for Z. mobilis to obtain a draft network. Next, recent gene annotations, up-to-date literature, physiological data and biochemical databases were used to upgrade the network. Afterward, the draft network went through a curative and iterative process of gap-filling by computational tools and manual refinement. The final model was evaluated using experimental data and literature information. We next applied this model as a platform for analyzing the links between transcriptome-flux and transcriptome-metabolome. We found that experimental observations were in agreement with the predicted results from our final GEM. Taken together, this comprehensive model (iHN446) can be utilized for studying metabolism in Z. mobilis and finding rational targets for metabolic engineering applications.
- Research Article
2
- 10.1360/tb-2020-1468
- Feb 3, 2021
- Chinese Science Bulletin
The genome-scale metabolic network model (GEM) is a mathematical framework based on gene-protein-reaction associations combined with stoichiometric balance and is capable of facilitating the computation and prediction of multiscale phenotypes by optimizing the objective function of interest. It has been increasingly used as an important tool for understanding cellular metabolism and characterizing cell phenotypes. In cells, metabolism is tightly controlled by intricate regulatory mechanisms at the different system levels and is strictly regulated to ensure the dynamic adaptation of biochemical reaction fluxes for maintaining cell homeostasis to ultimately achieve optimal metabolic fitness. Advances in high-throughput screening and analysis technologies have generated massive amounts of genome sequences, along with transcriptomic, proteomic and metabolomic data, providing quantitative regulatory information to gain insights into cellular metabolism; however, integrating the available omics data into constraint-based metabolic models and quantitatively profiling genotype-phenotype relationships remains an outstanding challenge for computational biology. Here, we describe the recent developments in introducing macromolecular expression into GEMs and generating metabolic expression (ME) models, which increase the complexity and predictive capability of computational frameworks. Various algorithms employ different approaches to combine additional layers of omics data to limit the cone of allowable flux distributions in the metabolic model. In this review, we categorize all methods by five different grouping criteria and evaluate their practical perspectives. The first category of methods utilizes a threshold to distinguish active and inactive states of the corresponding reactions based on the gene expression measurement data. The second uses omics data to build cell- and tissue-specific models of human metabolism by removing unexpressed reactions from the global human metabolic network. The third category of methods involves modifying reaction bounds on the basis of mRNA and protein abundance, in which the width of the “flux cone” is adjusted via the maximum possible flux in the upper bound of the FBA optimization problem dependent on gene and protein expression levels. The imposition of constraints further defines the associated solution space of the model to improve the prediction accuracy. The fourth model incorporates transcriptional regulation networks (TRNs), which describe the phenomenological interactions between different biomolecules in response to genetic and environmental perturbation, into GEMs and avoids the obstacles of information formulation to achieve comprehensive knowledge regarding the metabolic and regulatory events occurring inside the cell. The last category integrates time-series transcriptomics data with flux-based bilevel optimization to comprehend the interplay between metabolism and regulation in time-dependent processes. We compare the advantages and limitations of different categories and explore the application areas of integrated models in analyzing metabolic characteristics, interpreting phenotypic states and the consequences of environmental and genetic perturbations while discovering potential drug targets and screening anti-metabolic drugs for cancer treatment. Finally, we also highlight the future perspectives and challenges for GEM-based reconstruction with omics data integration.
- Research Article
- 10.1007/s10994-025-06868-0
- Jan 1, 2025
- Machine Learning
Reasoning about hypotheses and updating knowledge through empirical observations are central to scientific discovery. In this work, we applied logic-based machine learning methods to drive biological discovery by guiding experimentation. Genome-scale metabolic network models (GEMs) - comprehensive representations of metabolic genes and reactions - are widely used to evaluate genetic engineering of biological systems. However, GEMs often fail to accurately predict the behaviour of genetically engineered cells, primarily due to incomplete annotations of gene interactions. The task of learning the intricate genetic interactions within GEMs presents computational and empirical challenges. To efficiently predict using GEM, we describe a novel approach called Boolean Matrix Logic Programming (BMLP) by leveraging Boolean matrices to evaluate large logic programs. We developed a new system, BMLP_{active}, which guides cost-effective experimentation and uses interpretable logic programs to encode a state-of-the-art GEM of a model bacterial organism. Notably, BMLP_{active} successfully learned the interaction between a gene pair with fewer training examples than random experimentation, overcoming the increase in experimental design space. BMLP_{active} enables rapid optimisation of metabolic models to reliably engineer biological systems for producing useful compounds. It offers a realistic approach to creating a self-driving lab for biological discovery, which would then facilitate microbial engineering for practical applications.
- Research Article
44
- 10.1002/aps3.11371
- Jun 1, 2020
- Applications in Plant Sciences
Plants meet machines: Prospects in machine learning for plant biology
- Research Article
240
- 10.1093/bioinformatics/btq183
- Jun 1, 2010
- Bioinformatics
Motivation: The availability of modern sequencing techniques has led to a rapid increase in the amount of reconstructed metabolic networks. Using these models as a platform for the analysis of high throughput transcriptomic, proteomic and metabolomic data can provide valuable insight into conditional changes in the metabolic activity of an organism. While transcriptomics and proteomics provide important insights into the hierarchical regulation of metabolic flux, metabolomics shed light on the actual enzyme activity through metabolic regulation and mass action effects. Here we introduce a new method, termed integrative omics-metabolic analysis (IOMA) that quantitatively integrates proteomic and metabolomic data with genome-scale metabolic models, to more accurately predict metabolic flux distributions. The method is formulated as a quadratic programming (QP) problem that seeks a steady-state flux distribution in which flux through reactions with measured proteomic and metabolomic data, is as consistent as possible with kinetically derived flux estimations.Results: IOMA is shown to successfully predict the metabolic state of human erythrocytes (compared to kinetic model simulations), showing a significant advantage over the commonly used methods flux balance analysis and minimization of metabolic adjustment. Thereafter, IOMA is shown to correctly predict metabolic fluxes in Escherichia coli under different gene knockouts for which both metabolomic and proteomic data is available, achieving higher prediction accuracy over the extant methods. Considering the lack of high-throughput flux measurements, while high-throughput metabolomic and proteomic data are becoming readily available, we expect IOMA to significantly contribute to future research of cellular metabolism.Contacts: kerenyiz@post.tau.ac.il; tomersh@cs.technion.ac.il
- Research Article
19
- 10.1007/s10529-013-1328-x
- Sep 28, 2013
- Biotechnology Letters
Elementary modes (EMs) are steady-state metabolic flux vectors with minimal set of active reactions. Each EM corresponds to a metabolic pathway. Therefore, studying EMs is helpful for analyzing the production of biotechnologically important metabolites. However, memory requirements for computing EMs may hamper their applicability as, in most genome-scale metabolic models, no EM can be computed due to running out of memory. In this study, we present a method for computing randomly sampled EMs. In this approach, a network reduction algorithm is used for EM computation, which is based on flux balance-based methods. We show that this approach can be used to recover the EMs in the medium- and genome-scale metabolic network models, while the EMs are sampled in an unbiased way. The applicability of such results is shown by computing “estimated” control-effective flux values in Escherichia coli metabolic network.
- Dissertation
- 10.18174/416473
- Jul 4, 2017
Metabolic modeling to understand and redesign microbial systems
- Research Article
2
- 10.15171/ijb.1684
- Aug 11, 2018
- Iranian Journal of Biotechnology
BackgroundA genome-scale metabolic network model (GEM) is a mathematical representation of an organism’s metabolism. Today, GEMs are popular tools for computationally simulating the biotechnological processes and for predicting biochemical properties of (engineered) strains.ObjectivesIn the present study, we have evaluated the predictive power of two GEMs, namely iBsu1103 (for Bacillus subtilis 168) and iMZ1055 (for Bacillus megaterium WSH002).Materials and MethodsFor comparing the predictive power of Bacillus subtilis and Bacillus megaterium GEMs, experimental data were obtained from previous wet-lab studies included in PubMed. By using these data, we set the environmental, stoichiometric and thermodynamic constraints on the models, and FBA is performed to predict the biomass production rate, and the values of other fluxes. For simulating experimental conditions in this study, COBRA toolbox was used.ResultsBy using the wealth of data in the literature, we evaluated the accuracy of in silico simulations of these GEMs. Our results suggest that there are some errors in these two models which make them unreliable for predicting the biochemical capabilities of these species. The inconsistencies between experimental and computational data are even greater where B. subtilis and B. megaterium do not have similar phenotypes.ConclusionsOur analysis suggests that literature-based improvement of genome-scale metabolic network models of the two Bacillus species is essential if these models are to be successfully applied in biotechnology and metabolic engineering.
- Research Article
4
- 10.21859/ijb.1684
- Aug 1, 2018
- Iranian Journal of Biotechnology
Background: A genome-scale metabolic network model (GEM) is a mathematical representation of an organism’s metabolism. Today, GEMs are popular tools for computationally simulating the biotechnological processes and for predicting biochemical properties of (engineered) strains.Objectives: In the present study, we have evaluated the predictive power of two GEMs, namely iBsu1103 (for Bacillus subtilis 168) and iMZ1055 (for Bacillus megaterium WSH002).Materials and Methods: For comparing the predictive power of Bacillus subtilis and Bacillus megaterium GEMs, experimental data were obtained from previous wet-lab studies included in PubMed. By using these data, we set the environmental, stoichiometric and thermodynamic constraints on the models, and FBA is performed to predict the biomass production rate, and the values of other fluxes. For simulating experimental conditions in this study, COBRA toolbox was used.Results: By using the wealth of data in the literature, we evaluated the accuracy of in silico simulations of these GEMs. Our results suggest that there are some errors in these two models which make them unreliable for predicting the biochemical capabilities of these species. The inconsistencies between experimental and computational data are even greater where B. subtilis and B. megaterium do not have similar phenotypes.Conclusions: Our analysis suggests that literature-based improvement of genome-scale metabolic network models of the two Bacillus species is essential if these models are to be successfully applied in biotechnology and metabolic engineering.
- Research Article
38
- 10.1016/j.coisb.2021.03.001
- Mar 1, 2021
- Current Opinion in Systems Biology
Machine learning applications in genome-scale metabolic modeling
- Ask R Discovery
- Chat PDF
AI summaries and top papers from 250M+ research sources.