Prediction of Epitope-Associated TCR by Using Network Topological Similarity Based on Deepwalk

Jingshu Bi,Sujuan Hou,Yuanjie Zheng,Chengjiang Li,Fang Yan

doi:10.1109/access.2019.2948178

Abstract

Currently, there are many tools available online for T-cell epitope prediction. They usually focus on the binding of peptides to major histocompatibility complex (MHC) molecules on the surface of antigen-presenting cells (APCs). However, the binding of peptides and MHC complexes to the T-cell receptor (TCR) is also critical for the immune process. Identifying the binding of human epitopes to TCRs will be useful for developing vaccines. It also has great prospects in medical issues such as cancer and autoimmune diseases. We propose a similarity-based TCR-epitope prediction method using a similarity measure. This paper introduces the Deepwalk method to calculate the topological similarity between TCR-TCRs, constructs a TCR similarity network topology, and predicts the correlation between TCRs and epitopes based on known TCR-epitope associations. We selected data from 22 types of epitopes from the VDJDB database and trained models to implement TCR-epitope prediction. We trained a model on the data from the 22 types of epitopes, predicting which epitope each TCR belongs to. To compare with other methods, we also generated a second method involving training a model for each type of epitope so that we can predict which TCR is bound to the epitope from a large pool of TCRs. We used the ROC curve, PR curve and other evaluation indicators to evaluate our model in 10-fold cross-validation. In the first model, the AUC value of our method is 0.926, and that of the support vector machine (SVM) method is 0.924. Considering that no one has ever used the first prediction model, we used the second method for the predictions. The results show better predictive performance compared to the SVM method, TCRGP method and random forest method. Our AUC values range from 0.660 to 0.950. The experimental results show that our method outperforms other methods in TCR-epitope prediction, which can help predict the TCR-epitope.

Highlights

The T-cell receptor (TCR) is a characteristic molecular marker found on the T-cell surface
If the major histocompatibility complex (MHC)-peptide complex can be recognized by the TCR, it can induce the immune response of the T cell
These results indicate that constructing a TCR similarity topology network contributes to TCR epitope prediction

Summary

INTRODUCTION

The T-cell receptor (TCR) is a characteristic molecular marker found on the T-cell surface. The random forest method predicts the identification of the TCR epitope by extracting the features of TCR amino acid sequences and placing them into a random forest classifier. The TCR-epitope topology network based on the TCR-epitope data verified by known biological experiments and the calculated TCR sequence similarity contributes to the TCR-epitope prediction. Using a deep learning method, Deepwalk [29], we extracted the features of the vertices in the topology structure, and predicted the TCR epitope based on the known data. TCR-TCR SIMILARITY SPACE we use a distance-based method such as physicochemical differences or sequence alignment-gap penalty for each amino acid to construct a topological. The dashed line indicates that a distance-based approach is used to construct a topology similarity network for TCR sequences. It normalizes to the [0,1] interval by using the algorithm for local alignment and dividing the alignment score by the minimum of the two sequence self scores

DEEPWALK SIMILARITY LEARNING

CLASSIFICATION ALGORITHM AND VALIDATION

CONCLUSION