Abstract
High-quality data on protein-ligand complex structures and binding affinities are crucial for structure-based drug design. Existing datasets often lack diversity and quantity, limiting the comprehensive understanding of protein-ligand interactions. Here, we present BindingNet v2, an expanded dataset comprising 689,796 modeled protein-ligand binding complexes across 1794 protein targets. Constructed using an enhanced template-based modeling workflow from BindingNet v1, it incorporates pharmacophore and molecular shape similarities. BindingNet v2’s effectiveness in binding pose generation was evaluated, showing an improved generalization ability of Uni-Mol model for novel ligands. The success rate on the PoseBusters dataset increased from 38.55% with the PDBbind dataset alone to 64.25% with augmenting BindingNet v2. Coupled with physics-based refinement, the success rate rose to 74.07%, passing PoseBusters validity checks. These results highlight the value of larger, diverse datasets in enhancing the accuracy and reliability of deep learning models for binding pose prediction.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have