Peptides are pivotal in numerous biological activities by engaging in up to 40 % of protein-protein interactions in many cellular processes. Due to their exceptional specificity and effectiveness, peptides have emerged as promising candidates for drug design. However, accurately predicting protein-peptide binding affinity remains a challenging. Aiming at the problem, we develop a prediction model PepPAP based on convolutional neural network and multi-head attention, which relies solely on sequence features. These features include physicochemical properties, intrinsic disorder, sequence encoding, and especially interface propensity which is extracted from 16,689 non-redundant protein-peptide complexes. Notably, the adopted regression stratification cross-validation scheme proposed in our previous work is beneficial to improve the prediction for the cases with extreme binding affinity values. On three benchmark test datasets: T100, a series of peptides targeting to PDZ domain and CXCR4, PepPAP shows excellent performance, outperforming the existing methods and demonstrating its good generalization ability. Furthermore, PepPAP has good results in binary interaction prediction, and the analysis of the feature space distribution visualization highlights PepPAP's effectiveness. To the best of our knowledge, PepPAP is the first sequence-based deep attention model for wide-genome protein-peptide binding affinity prediction, and holds the potential to offer valuable insights for the peptide-based drug design.
Read full abstract