Abstract

Virtual screening (VS) based on molecular docking has emerged as one of the mainstream technologies of drug discovery due to its low cost and high efficiency. However, the scoring functions (SFs) implemented in most docking programs are not always accurate enough and how to improve their prediction accuracy is still a big challenge. Here, we propose an integrated platform called ASFP, a web server for the development of customized SFs for structure-based VS. There are three main modules in ASFP: (1) the descriptor generation module that can generate up to 3437 descriptors for the modelling of protein–ligand interactions; (2) the AI-based SF construction module that can establish target-specific SFs based on the pre-generated descriptors through three machine learning (ML) techniques; (3) the online prediction module that provides some well-constructed target-specific SFs for VS and an additional generic SF for binding affinity prediction. Our methodology has been validated on several benchmark datasets. The target-specific SFs can achieve an average ROC AUC of 0.973 towards 32 targets and the generic SF can achieve the Pearson correlation coefficient of 0.81 on the PDBbind version 2016 core set. To sum up, the ASFP server is a powerful tool for structure-based VS.

Highlights

  • As one of the core technologies in virtual screening (VS), molecular docking has been extensively used to screen small molecule libraries for lead discovery [1]

  • Unlike traditional SFs, machine learning (ML)-based scoring functions (MLSFs) do not have particular theory-motivated functional forms, and they are developed by learning from very large volumes of protein–ligand structural and interaction data through ML algorithms, such as random forest (RF), support vector machine (SVM), artificial neural network (ANN), gradient boosting decision tree (GBDT), etc [3, 5,6,7,8]

  • In order to develop an MLSF, we need to generate a set of features to characterize protein–ligand interactions, and we need to be familiar with ML algorithms, which may be a difficult task for non-experts

Read more

Summary

Introduction

As one of the core technologies in virtual screening (VS), molecular docking has been extensively used to screen small molecule libraries for lead discovery [1]. Four parameters can be used to assess the prediction capability of a SF, including scoring power (binding affinity prediction), ranking power (relative ranking prediction), docking power (binding pose prediction), and screening power (discrimination of true binders from decoys) [3, 4]. MLSFs have the capability to capture the non-linear relationship between protein–ligand interaction features and binding mode that are difficult to be characterized by classical SFs, yielding better binding strength predictions [9, 10]. In order to develop an MLSF, we need to generate a set of features to characterize protein–ligand interactions, and we need to be familiar with ML algorithms, which may be a difficult task for non-experts

Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call