Knowledge of B cell epitopes is critical to vaccine design, diagnostics, and therapeutics. As experimental validation for epitopes is time-consuming and costly, many in silico tools have been developed to computationally predict the B cell epitopes. While most methods show poor performance, deep learning methods in recent years have shown promising results. We developed a method called EpiGraph that outperformed previous methods, including those that showed a significant improvement in performance in recent years. Our model's performance can be attributed to the following factors: (1) a combination of structure and sequence feature embeddings obtained from pretrained ESM-IF1 and ESM-2 models could capture the structural and evolutionary features of B cell epitopes, (2) a graph attention network could learn the spatial proximity of B cell epitopes with high graph homophily, and (3) residual connections in the model framework mitigate the over-smoothing problem in the graph neural network. Our model achieved the highest performance on an independent benchmark dataset. The results were also consistent on a different dataset. The datasets and source codes are available at https://github.com/sj584/EpiGraph . A user-friendly web server is freely available at http://epigraph.kaist.ac.kr .
Read full abstract