Abstract

Abstract Background: Computational models in medical research which use molecular features to predict patient sensitivities or outcomes are traditionally limited to producing a scalar output for a given disease phenotype, e.g. progression-free survival or drug response. Even for relatively intelligible models, it is generally difficult to place confidence limits on the prediction or develop an intuition for how a prediction varies with respect to “what-if” changes in those features or how sensitive a prediction is to changes in input variables. Moreover, while essential for understanding and modeling tumor behavior, molecular features from patient tumors in real-world settings are typically limited to profiles of hundreds of genes, while the biologically relevant inputs to the patient’s condition can be much larger. Approach: To generate data required to enhance the interpretation of predictive models and extend the utility of real-world patient oncology datasets, a distribution of conditions consistent with the observed features and their incidence across cancer patient cohorts is required. We hypothesized we could synthesize patient samples drawn from the joint probability distribution of a broader universe of features when a subset of them is held fixed at the observed values. We model this joint distribution by learning a Bayesian network over a broad feature set for which some training data is available and generate feature profiles by applying Gibbs sampling to the learned network. Results: We assessed the potential clinical utility of using these generated feature profiles in predicting drug response and enhancing biological interpretation of different model outputs. We found generative somatic mutations were useful for predicting cancer patient outcomes, including drug response predictions in breast cancer tumors from AACR Project GENIE. Synthesized patient feature profiles enhanced biological interpretability of nominal panel data (100-200 genes), providing a means to assign probabilities to outcomes and uncovering previously described mechanisms of drug response in real-world patient data. Conclusion: We introduce an approach that leverages Bayesian networks to synthesize richly annotated patient feature profiles from limited molecular data routinely collected in real-world settings. This approach addresses challenges associated with limited molecular data, biological interpretability and evaluation of predictive models and enhances the utility of real-world datasets for cancer research. Citation Format: Dillon H. Tracy, Jeff Sherman, Maayan Baron. Generative Bayesian networks for augmentation of molecular data from commercial genetics panels [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2024; Part 1 (Regular Abstracts); 2024 Apr 5-10; San Diego, CA. Philadelphia (PA): AACR; Cancer Res 2024;84(6_Suppl):Abstract nr 7373.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call