Abstract

Background: Accurate prediction of epitopes presented by human leukocyte antigen (HLA) is crucial for personalized cancer immunotherapies targeting T cell epitopes. Mass spectrometry (MS)profiling of eluted HLA ligands, which provides unbiased, high-throughput measurements of HLA associated peptides resulting from in vivo cellular processing, can be a highly valuable training set to build predictive models of HLA binding. In addition, gene expression profiles measured by RNA-seq data in a specific cell type could significantly improve the positive predictive value (PPV) of epitope presentation prediction. Although large amount of high-quality mass spectrometry data of HLA-bound peptides is being generated in the last few years, few of them provide matching RNA-seq data, which makes incorporating gene expression into epitope prediction difficult. Here, we aim to develop a publicly available prediction tool incorporating both sources of information, and demonstrate its superior performance over existing methods. Methods: We obtained public HLA peptidome datasets with matching RNA-seq data of twelve cell lines derived from multiple tissues. We used these MS HLA ligand data to build Position Score Specific Matrixes (PSSMs) for five HLA-I alleles across these cell lines. We then used logistic regression to model the relationship among PSSM score, gene expression, peptide length distribution and whether the peptide could be presented in each of the twelve cell lines, and compared the feature weights among them. Results: We found that the feature weights across different HLA-I alleles and cell lines were close to each other, suggesting that there is a universal relationship between PSSM score and gene expression across different cell lines that could be applied to epitope presentation prediction for multiple alleles in diverse tissues. When we replaced the cell-line-specific weights with universal weights summarized from all the cell lines, the logistic regression model’s predicted power for each cell line only dropped slightly and still substantially outperformed predictions based on PSSM scores alone. Based on such a finding, we applied the universal feature weights to more than 180,000 unique HLA ligands collected from public HLA peptidomics datasets, and presented an Epitope Presentation Integrated prediCtion (EPIC) model for 66 HLA alleles. EPIC was substantially better than other popular methods, including MixMHCpred, NetMHCpan (v4.0), and MHCflurry, when evaluated on independent HLA eluted ligand datasets, with an average 0.1%PPV of 53.58%, compared to 40.50%, 40.20%, 29.81%, and 25.57% achieved by MixMHCpred, NetMHCpan (EL), NetMHCpan (BA), and MHCflurry, respectively. Conclusion: By integrating MS and expression data, EPIC is superior to currently available methods in predicting epitope presentation for the 66 common HLA alleles that our models were built on. Citation Format: Weipeng Hu, Si Qiu, Youping Li, Geng Liu, Xiuqing Zhang, Leo J Lee. EPIC: MHC-I epitope prediction integrating mass spectrometry derived motifs and tissue-specific expression profiles [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2019; 2019 Mar 29-Apr 3; Atlanta, GA. Philadelphia (PA): AACR; Cancer Res 2019;79(13 Suppl):Abstract nr 3383.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.