Abstract Neoantigens, which are antigens specific to cancer cells, can be harnessed to develop precision immunotherapies, such as personalized cancer vaccines, and prognostic biomarkers for checkpoint blockade inhibition. Next generation sequencing technologies have enabled comprehensive profiling of putative neoantigens by interrogating the tumor exome and transcriptome, but accurate prediction of peptides presented by MHC complexes remains a significant challenge. We present here Systematic HLA Epitope Ranking Pan Algorithm (SHERPA™) that addresses this critical need. SHERPA comprises highly sensitive, accurate and pan-allelic MHC-peptide (MHCp) binding and presentation prediction models, that were built using a multi-pronged strategy. First, we generated a large-scale, high-quality HLA ligandome using approximately 75 stably transfected mono-allelic K562 cell lines. Intact MHCp complexes were immunoprecipitated using W6/32 antibody and profiled using LC/MS-MS. Second, we trained models that predict both MHCp binding and presentation. Briefly, MHCp binding was modeled using the amino acid sequences of the ligand and the binding pocket of the cognate allele. MHCp presentation, which encompasses in vivo antigen processing, was modeled using multiple features including the expression level of the source protein, proteasomal cleavage, and two novel features representing presentation propensities of genes and regions within gene bodies. Third, we expanded the scale and scope of our in-house dataset using a large curated repository of publicly available mono- and multi-allelic datasets resulting in > 160 alleles and > 1.6 million peptides. Integrating data from diverse cell line and tissue types improved the generalizability of our models, a critically important aspect when applying our models to patient samples. Finally, we implemented a model-based deconvolution of multi-allelic datasets to generate pseudo mono-allelic data, and developed an integrative machine learning architecture to model our expanded HLA-ligandome. We evaluated the performance of our binding and prediction models on 10% held-out mono-allelic test data from multiple cell line sources. The precision at various recall values of both binding and prediction models was markedly higher than NetMHCPan 4.0, and the positive predictive values were 0.59 and 0.73 respectively, significantly higher compared to NetMHCpan 4.0 (PPV = 0.38). Additionally, a strong concordance of raw and predicted motifs for alleles excluded from training data indicated a robust pan-allelic performance. When evaluated on 12 tissue samples profiled in-house, the SHERPA presentation model had a consistently high recall (90%) compared to NetMHCpan 4.0 (63%). This trend holds true on external immunopeptidomics datasets from tumor samples. In summary, SHERPA enables precision neoantigen discovery. Citation Format: Rachel Marty Pyke, Dattatreya Mellacheruvu, Steven Dea, Charles Abbott, Nick Phillips, Sejal Desai, Rena McClory, Steven Ketelaars, Pia Kvistborg, John West, Richard Chen, Sean Michael Boyle. Accurate modeling of antigen processing and MHC peptide presentation using large-scale immunopeptidomes and a novel machine learning framework [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2021; 2021 Apr 10-15 and May 17-21. Philadelphia (PA): AACR; Cancer Res 2021;81(13_Suppl):Abstract nr 1898.
Read full abstract