Abstract
In cheminformatics, machine learning methods are typically used to classify chemical compounds into distinctive classes such as active/nonactive or toxic/nontoxic. To train a classifier, a training data set must consist of examples from both positive and negative classes. While a biological activity or toxicity can be experimentally measured, another important molecular property, a synthetic feasibility, is a more abstract feature that can’t be easily assessed. In the present paper, we introduce Nonpher, a computational method for the construction of a hard-to-synthesize virtual library. Nonpher is based on a molecular morphing algorithm in which new structures are iteratively generated by simple structural changes, such as the addition or removal of an atom or a bond. In Nonpher, molecular morphing was optimized so that it yields structures not overly complex, but just right hard-to-synthesize. Nonpher results were compared with SAscore and dense region (DR), other two methods for the generation of hard-to-synthesize compounds. Random forest classifier trained on Nonpher data achieves better results than models obtained using SAscore and DR data.
Highlights
Virtual screening is a well-established approach in which possible biologically active molecules are searched in the large collections of available screening compounds [1, 2]
Because complexities are correlated with molecular weight (MW) [18], ZINC12 structures were binned by their MW into eleven intervals, each 50 Da wide
The random forest (RF) model trained on molecules selected according to their SAscore achieved the accuracy of 82.5% (AUC 0.89) and training data created by the dense region (DR) method lead only to the accuracy of 46.0% (AUC 0.60)
Summary
Virtual screening is a well-established approach in which possible biologically active molecules are searched in the large collections of available screening compounds [1, 2]. Virtual screening generates structures mostly similar to known ones. If new chemotypes are to be identified, virtual compounds can be assembled from scratch using de novo design [3] that, typically, generates thousands of potentially novel compounds. Because it is unrealistic to synthesize and test so many compounds [4], their synthetic accessibility is assessed and compounds difficult to synthesize are removed from the virtual library. The assessment of compound synthetic feasibility can be done either manually, or computationally. Due to a large number of structures, a manual examination is usually impractical. It has been demonstrated that medicinal chemists are not very
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.