Cocrystal virtual screening based on the XGBoost machine learning model

Dezhi Yang,Li Wang,Penghui Yuan,Qi An,Bin Su,Mingchao Yu,Ting Chen,Kun Hu,Li Zhang,Yang Lu,Guanhua Du

doi:10.1016/j.cclet.2022.107964

Abstract

Co-crystal formation can improve the physicochemical properties of a compound, thus enhancing its druggability. Therefore, artificial intelligence-based co-crystal virtual screening in the early stage of drug development has attracted extensive attention from researchers. However, the complexity of developing and applying algorithms hinders it wide application. This study presents a data-driven co-crystal prediction method based on the XGBoost machine learning model of the scikit-learn package. The simplified molecular input line entry specification (SMILES) information of two compounds is simply inputted to determine whether a co-crystal can be formed. The data set includs the co-crystal records presented in the Cambridge Structural Database (CSD) and the records of no co-crystal formation from extant literature and experiments. RDKit molecular descriptors are adopted as the features of a compound in the data set. The developed model shows excellent performance in the proposed co-crystal training and validation sets with high accuracy, sensitivity, and F1 score. The prediction success rate of the model exceeds 90%. The model therefore provides a simple and feasible scheme for designing and screening co-crystal drugs efficiently and accurately.

Full Text