Optimization of medium components for protein production by Escherichia coli with a high-throughput pipeline that uses a deep neural network

Kazuki Watanabe,Tai-Ying Chiou,Masaaki Konishi

doi:10.1016/j.jbiosc.2024.01.005

Abstract

To optimize rapidly the medium for green fluorescent protein expression by Escherichia coli with an introduced plasmid, pRSET/emGFP, a single-cycle optimization pipeline was applied. The pipeline included a deep neural network (DNN) and mathematical optimization algorithms with simultaneous optimization of 18 medium components. To evaluate the DNN data sampling method, two methods, orthogonal array (OA) and Latin hypercube sampling (LHS), were used to design 64 initial media for each sampling method. The OA- and LHS-based data sampling resulted in green fluorescent protein fluorescence intensities of 0.088× 103-1.85× 104 and 3.30× 103-1.50× 104, respectively. Fifty DNN models were built using the OA and LHS datasets. Hold-out validation was performed using 15% test of OA and LHS data. Mean square errors of the DNN models were 0.015-0.64, indicating the estimation accuracies were sufficient. However, the sensitivities of components in the DNN models varied and were grouped into six major classes by the index of k-means clustering. A representative model was selected for each class. Mathematical optimization algorithms using Bayesian optimization and genetic algorithm were applied to the representative models, and representative optimized medium (OM) compositions were selected by k-means clustering from the proposed OMs. A total of 54 OMs were obtained from the OA and LHS datasets. In the validating cultivation, the best OMs of OA and LHS were 2.12-fold and 2.13-fold higher, respectively, than those of the learning data.

Full Text