Utilizing the Heterogeneity of Clinical Data for Model Refinement and Rule Discovery Through the Application of Genetic Algorithms to Calibrate a High-Dimensional Agent-Based Model of Systemic Inflammation.

Chase Cockrell,Gary An

doi:10.3389/fphys.2021.662845

Abstract

Introduction: Accounting for biological heterogeneity represents one of the greatest challenges in biomedical research. Dynamic computational and mathematical models can be used to enhance the study and understanding of biological systems, but traditional methods for calibration and validation commonly do not account for the heterogeneity of biological data, which may result in overfitting and brittleness of these models. Herein we propose a machine learning approach that utilizes genetic algorithms (GAs) to calibrate and refine an agent-based model (ABM) of acute systemic inflammation, with a focus on accounting for the heterogeneity seen in a clinical data set, thereby avoiding overfitting and increasing the robustness and potential generalizability of the underlying simulation model.Methods: Agent-based modeling is a frequently used modeling method for multi-scale mechanistic modeling. However, the same properties that make ABMs well suited to representing biological systems also present significant challenges with respect to their construction and calibration, particularly with respect to the selection of potential mechanistic rules and the large number of associated free parameters. We have proposed that machine learning approaches (such as GAs) can be used to more effectively and efficiently deal with rule selection and parameter space characterization; the current work applies GAs to the challenge of calibrating a complex ABM to a specific data set, while preserving biological heterogeneity reflected in the range and variance of the data. This project uses a GA to augment the rule-set for a previously validated ABM of acute systemic inflammation, the Innate Immune Response ABM (IIRABM) to clinical time series data of systemic cytokine levels from a population of burn patients. The genome for the GA is a vector generated from the IIRABM’s Model Rule Matrix (MRM), which is a matrix representation of not only the constants/parameters associated with the IIRABM’s cytokine interaction rules, but also the existence of rules themselves. Capturing heterogeneity is accomplished by a fitness function that incorporates the sample value range (“error bars”) of the clinical data.Results: The GA-enabled parameter space exploration resulted in a set of putative MRM rules and associated parameterizations which closely match the cytokine time course data used to design the fitness function. The number of non-zero elements in the MRM increases significantly as the model parameterizations evolve toward a fitness function minimum, transitioning from a sparse to a dense matrix. This results in a model structure that more closely resembles (at a superficial level) the structure of data generated by a standard differential gene expression experimental study.Conclusion: We present an HPC-enabled machine learning/evolutionary computing approach to calibrate a complex ABM to complex clinical data while preserving biological heterogeneity. The integration of machine learning, HPC, and multi-scale mechanistic modeling provides a pathway forward to more effectively representing the heterogeneity of clinical populations and their data.

Highlights

Accounting for biological heterogeneity represents one of the greatest challenges in biomedical research
Given the high-dimensional nature of this type of model parameter space we propose to use a machine learning/evolutionary computing optimization method, genetic algorithms (GAs), in order to generate an ensemble of parameterizations able to recapitulate a heterogeneous clinical data set
The GA could not converge well enough to produce Model Rule Matrix (MRM) able to generate IL-10 concentrations which matched the literature, with peaking occurring at 6 h post-insult rather than 5 days post-insult, as was seen clinically Figure 2A)

Summary

Introduction

Accounting for biological heterogeneity represents one of the greatest challenges in biomedical research. We propose a machine learning approach that utilizes genetic algorithms (GAs) to calibrate and refine an agent-based model (ABM) of acute systemic inflammation, with a focus on accounting for the heterogeneity seen in a clinical data set, thereby avoiding overfitting and increasing the robustness and potential generalizability of the underlying simulation model. We present a method utilizing genetic algorithms (GAs), a machine learning method for complex optimization, to calibrate and refine an agent-based model (ABM) of systemic inflammation to capture the heterogeneity and variability of a clinical data set This method represents a departure from traditional approaches to calibration and parameterization that generally focus on using “cleaner” data sets with less variation/heterogeneity and/or fitting to a regression that draws a mean through what variation is present in the selected data, a process that can result in over-fit and brittle models. We propose that models (in terms of both parameters and interaction rules) selected for being able to reproduce an entire range of data within a dataset are more robust and generalizable, and able to enhance the translation and applicability of knowledge

Objectives

Methods

Results

Conclusion