Predicting or Pretending: Artificial Intelligence for Protein-Ligand Interactions Lack of Sufficiently Large and Unbiased Datasets.

Jincai Yang,Cheng Shen,Niu Huang

doi:10.3389/fphar.2020.00069

Abstract

Predicting protein-ligand interactions using artificial intelligence (AI) models has attracted great interest in recent years. However, data-driven AI models unequivocally suffer from a lack of sufficiently large and unbiased datasets. Here, we systematically investigated the data biases on the PDBbind and DUD-E datasets. We examined the model performance of atomic convolutional neural network (ACNN) on the PDBbind core set and achieved a Pearson R2 of 0.73 between experimental and predicted binding affinities. Strikingly, the ACNN models did not require learning the essential protein-ligand interactions in complex structures and achieved similar performance even on datasets containing only ligand structures or only protein structures, while data splitting based on similarity clustering (protein sequence or ligand scaffold) significantly reduced the model performance. We also identified the property and topology biases in the DUD-E dataset which led to the artificially increased enrichment performance of virtual screening. The property bias in DUD-E was reduced by enforcing the more stringent ligand property matching rules, while the topology bias still exists due to the use of molecular fingerprint similarity as a decoy selection criterion. Therefore, we believe that sufficiently large and unbiased datasets are desirable for training robust AI models to accurately predict protein-ligand interactions.

Highlights

Structure-based virtual screening has been widely used to discover new ligands based on target structures (Kitchen et al, 2004; Shoichet, 2004; Irwin and Shoichet, 2016; Zhou et al, 2016; Wang et al, 2017; Lyu et al, 2019; Peng et al, 2019)
We evaluated the performance of atomic convolutional neural network (ACNN) model to predict protein-ligand binding affinities on the PDBbind datasets using different data splitting approaches
The former is represented by PDBbind, a collection of experimentally determined proteinligand complex structures with known binding affinities, which is reliable, but the amount of data is small and arguably suffers from the data redundancy caused by the protein and ligand similarity

Summary

Introduction

Structure-based virtual screening (molecular docking) has been widely used to discover new ligands based on target structures (Kitchen et al, 2004; Shoichet, 2004; Irwin and Shoichet, 2016; Zhou et al, 2016; Wang et al, 2017; Lyu et al, 2019; Peng et al, 2019). The heart of molecular docking is the scoring function for estimation of binding affinities of protein-ligand complexes. The performance of virtual screening was evaluated on several public available benchmarking datasets, including the Community Structure-Activity Resource (CSAR) (Dunbar et al, 2011), the PDBbind (Liu et al, 2017), the Directory of Useful Decoys (DUD) (Huang et al, 2006b), and the Directory of Useful Decoys - Enhanced (DUD-E) (Mysinger et al, 2012). The CSAR and PDBbind datasets were compiled to facilitate the prediction of the binding affinities based on experimental complex structures. The DUD and DUD-E datasets were originally designed to assess docking enrichment performance by distinguishing the annotated actives from among a large database of computationally generated non-binding decoy molecules

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Frontiers in Pharmacology	Publication Date: Feb 25, 2020
Citations: 93	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Predicting or Pretending: Artificial Intelligence for Protein-Ligand Interactions Lack of Sufficiently Large and Unbiased Datasets.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Frontiers in Pharmacology

Lead the way for us

Similar Papers

Real-World Surveillance of FDA-Cleared Artificial Intelligence Models: Rationale and Logistics.
Keith J Dreyer ... Christoph Wald
Journal of the American College of Radiology | VOL. 19
Keith J Dreyer, et. al.Keith J Dreyer ... Christoph Wald
01 Feb 2022
Journal of the American College of Radiology | VOL. 19

The potential impact of ChatGPT in clinical and translational medicine.
Vivian Weiwen Xue ... William C Cho
Clinical and Translational Medicine | VOL. 13
Vivian Weiwen Xue, et. al.Vivian Weiwen Xue ... William C Cho
01 Mar 2023
Clinical and Translational Medicine | VOL. 13

Predicting Mandibular Bone Growth Using Artificial Intelligence and Machine Learning: A Systematic Review
Mahmood Dashti ... Tara Azimi
Advances in Artificial Intelligence and Machine Learning | VOL. 04
Mahmood Dashti, et. al.Mahmood Dashti ... Tara Azimi
01 Jan 2024
Advances in Artificial Intelligence and Machine Learning | VOL. 04

Predictive modeling in reproductive medicine: Where will the future of artificial intelligence research take us?
Carol Lynn Curchoe ... Zev Rosenwaks
Fertility and Sterility | VOL. 114
Carol Lynn Curchoe, et. al.Carol Lynn Curchoe ... Zev Rosenwaks
01 Nov 2020
Fertility and Sterility | VOL. 114

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Predicting or Pretending: Artificial Intelligence for Protein-Ligand Interactions Lack of Sufficiently Large and Unbiased Datasets.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Frontiers in Pharmacology