Abstract

Combining quantum chemistry characterizations with generative machine learning models has the potential to accelerate molecular discovery. In this paradigm, quantum chemistry acts as a relatively cost-effective oracle for evaluating the properties of particular molecules, while generative models provide a means of sampling chemical space based on learned structure-function relationships. For practical applications, multiple potentially orthogonal properties must be optimized in tandem during a discovery workflow. This carries additional difficulties associated with the specificity of the targets and the ability for the model to reconcile all properties simultaneously. Here, we demonstrate an active learning approach to improve the performance of multi-target generative chemical models. We first demonstrate the effectiveness of a set of baseline models trained on single property prediction tasks in generating novel compounds (i.e., not present in the training data) with various property targets, including both interpolative and extrapolative generation scenarios. For property ranges where accurate targeting proves difficult, the novel compounds suggested by the model are characterized using quantum chemistry and the new molecules closest to expressing the desired properties are fed back into the generative model for additional training. This gradually improves the generative models' understanding of targeted areas of chemical space and shifts the distribution of the generated compounds toward the targeted values. We then demonstrate the effectiveness of this active learning approach in generating compounds with multiple chemical constraints, including vertical ionization potential, electron affinity, and dipole moment targets, and validate the results at the ωB97X-D3/def2-TZVP level. This method requires no modifications to extant generative approaches, but rather utilizes their inherent generative and predictive aspects for self-refinement, and can be applied to situations where any number of properties with varying degrees of correlation must be optimized simultaneously.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.