Abstract

Virtual screening (VS) has been incorporated into the paradigm of modern drug discovery. This field is now undergoing a new wave of revolution driven by artificial intelligence and more specifically, machine learning (ML). In terms of those out-of-the-box datasets for model training or benchmarking, their data volume and applicability domain are limited. They are suffering from the biases constantly reported in the ML application. To address these issues, we present a novel benchmark named MUBDsyn. The utilization of synthetic decoys (i.e., presumed inactives) is the main feature of MUBDsyn, where deep reinforcement learning was leveraged for bias control during decoy generation. Then, we carried out extensive validations on this new benchmark. First, we confirmed that MUBDsyn was superior to the classical benchmarks in control of domain bias, artificial enrichment bias and analogue bias. Moreover, we found that the assessment of ML models based on MUBDsyn was less biased as revealed by the analysis of asymmetric validation embedding bias. In addition, MUBDsyn showed better setting of benchmarking challenge for deep learning models compared with NRLiSt-BDB. Overall, we have proven that MUBDsyn is the close-to-ideal benchmark for VS. The computational tool is publicly available for the easy extension of MUBDsyn.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call