Abstract

Antigens presented on the cell surface have been subjected to multiple biological processes. Among them, C-terminal antigen processing constitutes one of the main bottlenecks of the peptide presentation pathways, as it delimits the peptidome that will be subjected downstream. Here, we present NetCleave, an open-source and retrainable algorithm for the prediction of the C-terminal antigen processing for both MHC-I and MHC-II pathways. NetCleave architecture consists of a neural network trained on 46 different physicochemical descriptors of the cleavage site amino acids. Our results demonstrate that prediction of C-terminal antigen processing achieves high accuracy on MHC-I (AUC of 0.91), while it remains challenging for MHC-II (AUC of 0.66). Moreover, we evaluated the performance of NetCleave and other prediction tools for the evaluation of four independent immunogenicity datasets (H2-Db, H2-Kb, HLA-A*02:01 and HLA-B:07:02). Overall, we demonstrate that NetCleave stands out as one of the best algorithms for the prediction of C-terminal processing, and we provide one of the first evidence that C-terminal processing predictions may help in the discovery of immunogenic peptides.

Highlights

  • Adaptive immune system has evolved to locate, degrade and expose antigen sources to the T-cell repertoire, aiming to eliminate potential threats

  • Recent advances in personalized immunotherapies techniques have attracted the use of computational tools for the prediction of immunogenic antigens or neoantigens for vaccination efforts

  • The fact is that a large number of variables could play important roles, which increases the complexity of the overall prediction

Read more

Summary

Introduction

Adaptive immune system has evolved to locate, degrade and expose antigen sources to the T-cell repertoire, aiming to eliminate potential threats This herculean task is accomplished by the antigen presentation pathways, which are composed by a complex network of specialized cells, proteolytic enzymes, peptide recognition and transportation, and protein–protein binding events. Several cathepsins with different cleavage specificities have been described belonging to serine proteases (cathepsin A and G), aspartic proteases (cathepsin D and E) and cysteine proteases (cathepsin B, C, F, H, K, L, O, S, V, X, and W)[13] Some of these proteolytic enzymes are poorly characterized, which hampers the development of efficient predictive algorithms for the overall process. Instead of following a one-hot encoding scheme, we feeded our neural network with a set 46 different amino acid descriptors (16 hydrophobic, 17 steric and 15 electronic features) publicly ­available[24], as previously used in the ­literature[25]

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call