DeepSeqPan, a novel deep convolutional neural network model for pan-specific class I HLA-peptide binding affinity prediction

Yuxin Cui,Zhonghao Liu,Jianjun Hu,Alierza Nasiri,Zheng Xiong,Ansi Zhang

doi:10.1038/s41598-018-37214-1

Yuxin Cui, Zhonghao Liu + Show 4 more

Open Access

https://doi.org/10.1038/s41598-018-37214-1

Copy DOI

Abstract

Interactions between human leukocyte antigens (HLAs) and peptides play a critical role in the human immune system. Accurate computational prediction of HLA-binding peptides can be used for peptide drug discovery. Currently, the best prediction algorithms are neural network-based pan-specific models, which take advantage of the large amount of data across HLA alleles. However, current pan-specific models are all based on the pseudo sequence encoding for modeling the binding context, which is based on 34 positions identified from the HLA protein-peptide bound structures in early works. In this work, we proposed a novel deep convolutional neural network model (DCNN) for HLA-peptide binding prediction, in which the encoding of the HLA sequence and the binding context are both learned by the network itself without requiring the HLA-peptide bound structure information. Our DCNN model is also characterized by its binding context extraction layer and dual outputs with both binding affinity output and binding probability outputs. Evaluation on public benchmark datasets shows that our DeepSeqPan model without HLA structural information in training achieves state-of-the-art performance on a large number of HLA alleles with good generalization capability. Since our model only needs raw sequences from the HLA-peptide binding pairs, it can be applied to binding predictions of HLAs without structure information and can also be applied to other protein binding problems such as protein-DNA and protein-RNA bindings. The implementation code and trained models are freely available at https://github.com/pcpLiu/DeepSeqPan.

Highlights

Human leukocyte antigens (HLAs) are major histocompatibility complex (MHC) proteins located on the cell surface in human
Allele-specific models are trained with only the binding peptides tested on a specific allele and a separate allele-specific binding affinity prediction model is needed for each human leukocyte antigens (HLAs) allele
We proposed, DeepSeqPan, a novel deep convolutional neural network model for pan-specific HLA-peptide binding affinity prediction

Summary

Introduction

Human leukocyte antigens (HLAs) are major histocompatibility complex (MHC) proteins located on the cell surface in human. NetMHCPan, PickPocket and Kim et al.’s work are recently proposed pan-specific HLA-peptide binding prediction models trained on the large amount of HLA class I binding affinity data. NetMHCPan is the first pan-specific binding affinity prediction algorithm that takes a large number of peptide-HLA binding samples of different HLA alleles for model training and obtained state-of-the-art performance[6]. The network with the highest prediction performance (lowest square error) on the test set was selected as the final prediction model[6] This pseudo sequence encoding approach for pan-specific modeling has been used in PickPocket[11] and Kim’s algorithm[12], but with different machine learning algorithms for model training. Each pocket library entry is characterized by nine pairs, where each pair consists of a list of pocket amino acids and a specificity vector

Methods

Results

Conclusion