BackgroundLocating small molecule binding sites in target proteins, in the resolution of either pocket or residue, is critical in many drug-discovery scenarios. Since it is not always easy to find such binding sites using conventional methods, different deep learning methods to predict binding sites out of protein structures have been developed in recent years. The existing deep learning based methods have several limitations, including (1) the inefficiency of the CNN-only architecture, (2) loss of information due to excessive post-processing, and (3) the under-utilization of available data sources.MethodsWe present a new model architecture and training method that resolves the aforementioned problems. First, by layering geometric self-attention units on top of residue-level 3D CNN outputs, our model overcomes the problems of CNN-only architectures. Second, by configuring the fundamental units of computation as residues and pockets instead of voxels, our method reduced the information loss from post-processing. Lastly, by employing inter-resolution transfer learning and homology-based augmentation, our method maximizes the utilization of available data sources to a significant extent.ResultsThe proposed method significantly outperformed all state-of-the-art baselines regarding both resolutions—pocket and residue. An ablation study demonstrated the indispensability of our proposed architecture, as well as transfer learning and homology-based augmentation, for achieving optimal performance. We further scrutinized our model’s performance through a case study involving human serum albumin, which demonstrated our model’s superior capability in identifying multiple binding sites of the protein, outperforming the existing methods.ConclusionsWe believe that our contribution to the literature is twofold. Firstly, we introduce a novel computational method for binding site prediction with practical applications, substantiated by its strong performance across diverse benchmarks and case studies. Secondly, the innovative aspects in our method— specifically, the design of the model architecture, inter-resolution transfer learning, and homology-based augmentation—would serve as useful components for future work.