Abstract A key goal in immuno-oncology is the identification of tumor antigens recognized by CD8 T cells. Immune modulators such as PD-1 inhibitors indirectly promote T-cell attack against tumor antigens, and may be augmented by antigen-directed therapeutic immunization or adoptive cell therapy to increase clinical benefit. Such personalized therapeutics require accurate antigen identification from patient samples, which remains elusive today. Methods: We generated the largest reported dataset of human tumor transcriptomes and HLA class I peptidomes (N=111) from specimens of multiple tumor types. We used these data to train a deep learning model of HLA peptide presentation for antigen prediction. Our model architecture addressed two key challenges: (1) learning HLA-allele-specific models from tumor data where each sample expressed up to 6 unique HLA class I alleles and (2) incorporating information about all aspects of HLA presentation, including gene expression, proteasomal processing and stable binding of peptides to HLA. We evaluated the performance of the model on two independent test datasets. First, we tested the model on HLA presented peptides from five held-out tumor mass spectrometry samples. Then, to establish that accurate prediction of HLA presentation translates to prediction of antigens recognized by T cells in vivo, we compiled from three published studies a dataset of >2,000 mutations in 16 patients, with 23 mutations (i.e. neoantigens) recognized by PD1+ PBMC or TIL CD8 T cells. Recognition of a peptide by TIL or activated peripheral T-cells implies not only tumor presentation of the peptide, but also its ability to prime T cell responses, and thus represents the most stringent test of tumor antigen prediction. Results: The model demonstrated a breakthrough in prediction accuracy. On the mass spectrometry test data, it achieved a >10-fold improvement in positive predictive value (PPV) vs standard HLA binding affinity (~50% PPV at 40% recall for the MS-based model vs ~5% PPV for binding affinity prediction). On the T-cell test dataset, the model ranked T-cell recognized neoantigens on average >4-fold higher than standard prediction (median rank ~7 for the MS-based model vs ~30 for binding affinity). When selecting candidate neoantigens for hypothetical 10-neoantigen personalized immunotherapy, the model prioritized at least 1 T-cell recognized neoantigen in the top 10 for 9/11 patients with neoantigen responses vs 3/11 for binding affinity. For a hypothetical 20-neoantigen immunotherapy, the model correctly selected the majority (16/23, 70%) of recognized neoantigens. Conclusion: We used the largest dataset of tumor transcriptomes and HLA peptidomes reported to-date to train a deep learning model of HLA epitope presentation. The new model significantly outperforms state of the art methods, and has sufficient predictive accuracy for in silico antigen selection for personalized cancer immunotherapy. Citation Format: Brendan Bulik-Sullivan, Jennifer Busby, Matthew Davis, Lauren Young, Tyler Murphy, Andrew Clark, Fujiko Duke, Michele Busby, Adnan Derti, Mojca Skoberne, Karin Jooss, Corinne E. Gustafson, Assunta De Rienzo, William G. Richards, Nhien T. Dao, Hyeong R. Kim, Jamie E. Anderson, Chang-Min Choi, Vincent De Montpreville, Se Jin Jang, Olaf Mercier, Raphael Bueno, Elie Fadel, Joshua Francis, Roman Yelensky. Antigen identification for cancer immunotherapy by deep learning on tumor HLA peptides [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2018; 2018 Apr 14-18; Chicago, IL. Philadelphia (PA): AACR; Cancer Res 2018;78(13 Suppl):Abstract nr 5722.