Abstract

Organic synthesis has been widely used in drug discovery and development. The intelligent prediction and analysis of high-throughput coupling reaction yield is one of the important and challenging research hotspots in the field of organic synthesis. However, the existing methods focus on intelligent prediction rather than study and interpret the internal relationship between reaction conditions and yield. For tackling this problem, an intelligent analysis organic chemical synthesis model by combining topological data analysis (TDA) and Light Gradient Boosting Machine (LightGBM), named OCS-TGBM, is proposed to deeply explore the internal relationship between reaction conditions and yield, and obtain high-yield reaction conditions and combinations. In order to further enhance the performance of the OCS-TGBM model, a stratified diversity sampling strategy is introduced. Experimental results show that the OCS-TGBM model is superior to other methods in analyzing and predicting the reaction performance of high-throughput organic chemical synthesis. And it provides intelligent assistance for the optimal design of the reaction system and the evaluation of reaction conditions, thus greatly accelerating the process of the drug discovery and development.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call