Gaussian Process Regression-Based Near-Infrared d-Luciferin Analogue Design Using Mutation-Controlled Graph-Based Genetic Algorithm.

Sung Wook Moon,Seung Kyu Min

doi:10.1021/acs.jcim.3c00870

Abstract

Molecular discovery is central to the field of chemical informatics. Although optimization approaches have been developed that target-specific molecular properties in combination with machine learning techniques, optimization using databases of limited size is challenging for efficient molecular design. We present a molecular design method with a Gaussian process regression model and a graph-based genetic algorithm (GB-GA) from a data set comprising a small number of compounds by introducing mutation probability control in the genetic algorithm to enhance the optimization capability and speed up the convergence to the optimal solution. In addition, we propose reducing the number of parameters in the conventional GB-GA focusing on efficient molecular design from a small database. We generated a target-specific database by combining active learning and iterative design in the evolutionary methodologies and chose Gaussian process regression as the prediction model for molecular properties. We show that the proposed scheme is more efficient for optimization toward the target properties from goal-directed benchmarks with several drug-like molecules compared to the conventional GB-GA method. Finally, we provide a demonstration whereby we designed D-luciferin analogues with near-infrared fluorescence for bioimaging, which is desirable for effective in vivo light sources, from a small-size data set.

Full Text