Abstract

Irrespective of the final application of a molecule, synthetic accessibility is the rate-determining step in discovering and developing novel entities. However, synthetic complexity is challenging to quantify as a single metric, since it is a composite of several measurable metrics, some of which include cost, safety, and availability. Moreover, defining a single synthetic accessibility metric for both natural products and non-natural products poses yet another challenge given the structural distinctions between these two classes of compounds. Here, we propose a model for synthetic accessibility of all chemical compounds, inspired by the Central Limit Theorem, and devise a novel synthetic accessibility metric assessing the overall feasibility of making chemical compounds that has been fitted to a Gaussian distribution. Our approach utilizes a Gaussian mixture model (GMM) and Autoencoder, which rank synthetic complexity for natural products. This model can inform total synthesis of natural products, process chemistry in pharmaceutical contexts, materials science, and chemical engineering. Based on our findings, we conclude that the Autoencoder model is better suited to model the true probability distribution of synthetic complexity for natural products.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call