Abstract 5364: Language modeling of peptide-HLA interactions achieves state-of-the-art performance on prediction of peptide presentation by HLA Class II

Daniel J Sprague ,Karin Jooss,Ankur Dhanik,Monica Lane,Olivia Petrillo,Melissa Rotunno,Matthew Davis,Ítalo Faria Do Valle ,Joshua S Klein

doi:10.1158/1538-7445.am2023-5364

Abstract

Abstract Precise and sensitive prediction of neoantigen presentation to the immune system via human leukocyte antigen (HLA) class II molecules remains a challenge despite the early success of neural networks applied to HLA class I. However, it is necessary to address this modeling challenge because presentation of a neoantigen epitope by both classes of HLA molecules may be valuable to induce a sustained immune response with therapeutic cancer vaccines. Previously we have developed a machine-learning based platform, EDGETM, that provides a state-of-the-art model to predict presentation of peptides by HLA Class I. Here we propose a new addition to our EDGETM platform: a model that leverages structural information of putative epitopes and HLA class II alleles from their in-situ context to predict presentation of peptides by HLA class II. Our model achieves this by leveraging the Evolutionary Scale Model pre-trained protein language model (LM), which has been demonstrated to embed protein sequences with rich structural information. The input to the model is a linear peptide consisting of an epitope and its flanking amino acids, concatenated with structurally relevant amino acids from each HLA allele. This allows our model to treat the modeling problem entirely as a natural language processing task, which minimizes imputation of covariates found in prior approaches when performing inference in the context of vaccine design, while maximizing the richness of the LM embeddings on longer linear peptides. Crucially, this also allows our model to generalize to any allele that has a known sequence. Additionally, DR-, DP-, and DQ-specific immunoaffinity purified mass spectrometry multi-allelic (MA) presentation data were generated per tumor or cell line sample, spanning 89 alleles in aggregate. We demonstrate that incrementally decreasing HLA class II allele MA resolution during training results in substantially improved predictions for situations where MA presentation data has completely ambiguous epitope presentation across DR/DP/DQ alleles. Our model achieves an Average Precision (AP) of 0.92 and ROC-AUC of 0.98 on the same benchmark validation data as the current state-of-the-art model BERTMHC, which achieved an AP of 0.81 and ROC-AUC of 0.95. These are the best AP and ROC-AUC for an HLA Class II presentation model on this benchmark dataset to the best of our knowledge. Our model is a significant advancement in HLA class II epitope prediction that allows our EDGETM platform to bring neoantigen vaccine design optimized for both class I and class II presentation within reach. Citation Format: Daniel Sprague, Joshua Klein, Italo Faria do Valle, Olivia Petrillo, Melissa Rotunno, Matthew Davis, Monica Lane, Karin Jooss, Ankur Dhanik. Language modeling of peptide-HLA interactions achieves state-of-the-art performance on prediction of peptide presentation by HLA Class II. [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2023; Part 1 (Regular and Invited Abstracts); 2023 Apr 14-19; Orlando, FL. Philadelphia (PA): AACR; Cancer Res 2023;83(7_Suppl):Abstract nr 5364.

Full Text