PLAS-20k: Extended Dataset of Protein-Ligand Affinities from MD Simulations for Machine Learning Applications

Divya B Korlepara,Rakesh Srivastava,Shubham Sharma,U Deva Priyakumar,Aathira G Nair,Divya Nayar,Saalim H Raza,Vishal Kumar,Shruti Jeurkar,Indhu Ramachandran,Pradeep Kumar Pal,Vasavi C S,Shivangi Verma,Prathit Chatterjee,Sanjana Pandey,Shivam Pandit,Reena Jaglan,Kavita Thakran

doi:10.1038/s41597-023-02872-y

Abstract

Computing binding affinities is of great importance in drug discovery pipeline and its prediction using advanced machine learning methods still remains a major challenge as the existing datasets and models do not consider the dynamic features of protein-ligand interactions. To this end, we have developed PLAS-20k dataset, an extension of previously developed PLAS-5k, with 97,500 independent simulations on a total of 19,500 different protein-ligand complexes. Our results show good correlation with the available experimental values, performing better than docking scores. This holds true even for a subset of ligands that follows Lipinski’s rule, and for diverse clusters of complex structures, thereby highlighting the importance of PLAS-20k dataset in developing new ML models. Along with this, our dataset is also beneficial in classifying strong and weak binders compared to docking. Further, OnionNet model has been retrained on PLAS-20k dataset and is provided as a baseline for the prediction of binding affinities. We believe that large-scale MD-based datasets along with trajectories will form new synergy, paving the way for accelerating drug discovery.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Scientific Data	Publication Date: Feb 9, 2024
Citations: 2	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

PLAS-20k: Extended Dataset of Protein-Ligand Affinities from MD Simulations for Machine Learning Applications

Abstract

Talk to us

Similar Papers

More From: Scientific Data

Lead the way for us

Similar Papers

P125. Development of a novel ensemble machine learning algorithm for prediction of complications and readmission after anterior cervical spinal fusion
Akash A Shah ... Nelson Soohoo
The Spine Journal | VOL. 21
Akash A Shah, et. al.Akash A Shah ... Nelson Soohoo
10 Aug 2021
The Spine Journal | VOL. 21

P126. Development of a novel ensemble machine learning algorithm for prediction of complications and readmission after posterior cervical spinal fusion
Akash A Shah ... Nelson Soohoo
The Spine Journal | VOL. 21
Akash A Shah, et. al.Akash A Shah ... Nelson Soohoo
10 Aug 2021
The Spine Journal | VOL. 21

P30. Development of a machine learning algorithm for prediction of complications and readmission after lumbar spinal fusion
Akash A Shah ... Nelson Soohoo
The Spine Journal | VOL. 21
Akash A Shah, et. al.Akash A Shah ... Nelson Soohoo
10 Aug 2021
The Spine Journal | VOL. 21

PLAS-5k: Dataset of Protein-Ligand Affinities from Molecular Dynamics for Machine Learning Applications
Divya B Korlepara ... Subhajit Roy
Scientific data | VOL. 9
Divya B Korlepara, et. al.Divya B Korlepara ... Subhajit Roy
07 Sep 2022
Scientific data | VOL. 9

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

PLAS-20k: Extended Dataset of Protein-Ligand Affinities from MD Simulations for Machine Learning Applications

Abstract

Talk to us

Similar Papers

More From: Scientific Data