Abstract

Protein-ligand interactions play an essential role in many biological processes, and prior knowledge of ligand binding sites is necessary for successful drug design. Many 3D structure- and sequence-based methods have been proposed for identifying ligand binding sites. The 3D structure-based methods typically achieve better binding site prediction than the sequence-based methods. However, as deep-learning techniques that can extract structural information from large-scale sequence data have been developed, the performance gap between 3D structure- and sequence-based methods is narrowing. Nonetheless, there remains room for improvement in sequence-based prediction. We propose Pseq2Sites, a sequence-based deep-learning model for predicting ligand binding sites. Pseq2Sites comprises a 1D convolutional neural network that extracts local features from the protein sequence, and a position-based attention mechanism that captures long-distance dependencies between binding residues. To verify the effectiveness of the proposed method, we compared it with other state-of-the-art methods using three public datasets: COACH420, HOLO4K, and CSAR-NRC HiQ. Utilizing solely protein sequence information, Pseq2Sites outperformed 3D structure-based state-of-the-art methods on external test datasets; within the COACH420 dataset, Pseq2Sites remarkably identified 97% of the binding pockets (at a significance level δ = 0.5), which was 27% higher than the second highest-performing model. Pseq2Sites also achieved outstanding binding site prediction, even for proteins with low similarity to the training dataset. Our code is available at https://github.com/Blue1993/Pseq2Sites.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call