Abstract

Many biological processes are governed by protein-ligand interactions. One such example is the recognition of self and non-self cells by the immune system. This immune response process is regulated by the major histocompatibility complex (MHC) protein which is encoded by the human leukocyte antigen (HLA) complex. Understanding the binding potential between MHC and peptides can lead to the design of more potent, peptide-based vaccines and immunotherapies for infectious autoimmune diseases. We apply machine learning techniques from the natural language processing (NLP) domain to address the task of MHC-peptide binding prediction. More specifically, we introduce a new distributed representation of amino acids, name HLA-Vec, that can be used for a variety of downstream proteomic machine learning tasks. We then propose a deep convolutional neural network architecture, name HLA-CNN, for the task of HLA class I-peptide binding prediction. Experimental results show combining the new distributed representation with our HLA-CNN architecture achieves state-of-the-art results in the majority of the latest two Immune Epitope Database (IEDB) weekly automated benchmark datasets. We further apply our model to predict binding on the human genome and identify 15 genes with potential for self binding. Codes to generate the HLA-Vec and HLA-CNN are publicly available at: https://github.com/uci-cbcl/HLA-bind . xhx@ics.uci.edu. Supplementary data are available at Bioinformatics online.

Highlights

  • The major histocompatibility complex (MHC) are cell surface proteins used to bind intracellular peptide fragments and display them on cell surface for recognition by T-cells [Janeway et al, 2001]

  • We have proposed a model to learn a vector space distributed representation of amino acids from this human leukocyte antigens (HLA) class I dataset

  • We have described our deep learning method and how it takes advantage of this new distributed representation of amino acids to solve the problem of HLA class I-peptide binding prediction

Read more

Summary

Introduction

The major histocompatibility complex (MHC) are cell surface proteins used to bind intracellular peptide fragments and display them on cell surface for recognition by T-cells [Janeway et al, 2001]. The human leukocyte antigens (HLA) gene complex encodes these MHC proteins. HLAs displays a high degree of polymorphism, a variability maintained through the need to successfully process a wide range of foreign peptides [Jin et al, 2003, Williams, 2001]. There are different classes of HLAs including class I, II, and III corresponding to their location in the encoding region. Foreign antigens presented by class I HLAs attracts killer Tcells and provoke an immune response. Class II HLAs are only found on antigenpresenting cells, such as mononuclear phagocytes and B cells, and presents antigen from extracellular proteins [Ulvestad et al, 1994]. Unlike class I and II, class III HLAs encode proteins important for inflammation

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call