Abstract

Accurate identification of immunogenic regions in a given antigen chain is a difficult and actively pursued problem. Although accurate predictors for T-cell epitopes are already in place, the prediction of the B-cell epitopes requires further research. We overview the available approaches for the prediction of B-cell epitopes and propose a novel and accurate sequence-based solution. Our BEST (B-cell Epitope prediction using Support vector machine Tool) method predicts epitopes from antigen sequences, in contrast to some method that predict only from short sequence fragments, using a new architecture based on averaging selected scores generated from sliding 20-mers by a Support Vector Machine (SVM). The SVM predictor utilizes a comprehensive and custom designed set of inputs generated by combining information derived from the chain, sequence conservation, similarity to known (training) epitopes, and predicted secondary structure and relative solvent accessibility. Empirical evaluation on benchmark datasets demonstrates that BEST outperforms several modern sequence-based B-cell epitope predictors including ABCPred, method by Chen et al. (2007), BCPred, COBEpro, BayesB, and CBTOPE, when considering the predictions from antigen chains and from the chain fragments. Our method obtains a cross-validated area under the receiver operating characteristic curve (AUC) for the fragment-based prediction at 0.81 and 0.85, depending on the dataset. The AUCs of BEST on the benchmark sets of full antigen chains equal 0.57 and 0.6, which is significantly and slightly better than the next best method we tested. We also present case studies to contrast the propensity profiles generated by BEST and several other methods.

Highlights

  • Identification of immunogenic regions/segments in a given antigen protein chain finds important applications in immunotherapies [1,2]

  • Comparison on the fragment-based datasets We evaluate the results generated by our support vector machine (SVM) models, using both the model with all 198 features and the model with the selected 84 features, on two benchmark fragment-based datasets: BCPREDFrag and ChenFrag

  • Our BEST method predicts epitopes from full protein chains using a novel approach based on averaging selected scores generated from 20-mers by an SVM-based predictor

Read more

Summary

Introduction

Identification of immunogenic regions/segments in a given antigen protein chain finds important applications in immunotherapies [1,2]. The newest sequencebased predictors of continuous B-cell epitopes exclusively use support vector machine (SVM) models They include: (1) a method by Chen et al [20] that predicts 20-mer peptides using a new AA pair-based antigenicity scale [20]; (2) BCPred [21] that predict the 12, 14, 16, 18, 20, and 22-mer long epitopes directly from sequence using a new type of string kernel-based SVM; (3) COBEpro [22] which utilizes a two-stage design with an SVM that takes novel sequence similarity scores as inputs to predict variable-size peptides in the first stage and a second stage that combines these fragments to predict epitopes in full chains; and (4) BayesB method [23] that predicts epitopes of diverse lengths (from 12 to 20-mers) using position specific scoring matrix (PSSM) generated with PSI-BLAST [24].

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call