In drug discovery, molecular docking methods face challenges in accurately predicting energy. Scoring functions used in molecular docking often fail to simulate complex protein-ligand interactions fully and accurately leading to biases and inaccuracies in virtual screening and target predictions. We introduce the "Docking Score ML", developed from an analysis of over 200,000 docked complexes from 155 known targets for cancer treatments. The scoring functions used are founded on bioactivity data sourced from ChEMBL and have been fine-tuned using both supervised machine learning and deep learning techniques. We validated our approach extensively using multiple data sets such as validation of selectivity mechanism, the DUDE, DUD-AD, and LIT-PCBA data sets, and performed a multitarget analysis on drugs like sunitinib. To enhance prediction accuracy, feature fusion techniques were explored. By merging the capabilities of the Graph Convolutional Network (GCN) with multiple docking functions, our results indicated a clear superiority of our methodologies over conventional approaches. These advantages demonstrate that Docking Score ML is an efficient and accurate tool for virtual screening and reverse docking.
Read full abstract