Abstract

How to design experiments that accelerate knowledge discovery on complex biological landscapes remains a tantalizing question. We present an optimal experimental design method (coined OPEX) to identify informative omics experiments using machine learning models for both experimental space exploration and model training. OPEX-guided exploration of Escherichia coli’s populations exposed to biocide and antibiotic combinations lead to more accurate predictive models of gene expression with 44% less data. Analysis of the proposed experiments shows that broad exploration of the experimental space followed by fine-tuning emerges as the optimal strategy. Additionally, analysis of the experimental data reveals 29 cases of cross-stress protection and 4 cases of cross-stress vulnerability. Further validation reveals the central role of chaperones, stress response proteins and transport pumps in cross-stress exposure. This work demonstrates how active learning can be used to guide omics data collection for training predictive models, making evidence-driven decisions and accelerating knowledge discovery in life sciences.

Highlights

  • How to design experiments that accelerate knowledge discovery on complex biological landscapes remains a tantalizing question

  • Gaussian process (GP) were trained to model the genome-wide gene expression for all combinations and used OPEX to guide 30 cycles of experimentation

  • Each OPEX cycle resulted in a different biocide–antibiotic combination to explore (Fig. 2a), with the GPbased model being retrained with each new dataset obtained

Read more

Summary

Introduction

How to design experiments that accelerate knowledge discovery on complex biological landscapes remains a tantalizing question. We present an optimal experimental design method (coined OPEX) to identify informative omics experiments using machine learning models for both experimental space exploration and model training. OED methods usually formulate a sampling problem as an optimization problem, which aims to identify the experiment (s) to perform so that a specific objective is maximized, and a set of constraints are met[22] These methods usually balance exploration (global search, maximizing coverage of the experimental space) and exploitation (local search, refining existing solutions) objectives. Never used for omics experiments, OED methods can be especially useful in exploring the experimental space efficiently across a multitude of design dimensions and providing a method to produce training data that carry the maximum information content for training a predictive model

Methods
Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.