Prediction of B-cell epitopes using evolutionary information and propensity scales

Cheng-Wei Cheng,Emily Chia-Yu Su,Scott Yi-Heng Lin

doi:10.1186/1471-2105-14-s2-s10

Cheng-Wei Cheng, Emily Chia-Yu Su + Show 1 more

Open Access

https://doi.org/10.1186/1471-2105-14-s2-s10

Copy DOI

Abstract

BackgroundDevelopment of computational tools that can accurately predict presence and location of B-cell epitopes on pathogenic proteins has a valuable application to the field of vaccinology. Because of the highly variable yet enigmatic nature of B-cell epitopes, their prediction presents a great challenge to computational immunologists.MethodsWe propose a method, BEEPro (B-cell epitope prediction by evolutionary information and propensity scales), which adapts a linear averaging scheme on 16 properties using a support vector machine model to predict both linear and conformational B-cell epitopes. These 16 properties include position specific scoring matrix (PSSM), an amino acid ratio scale, and a set of 14 physicochemical scales obtained via a feature selection process. Finally, a three-way data split procedure is used during the validation process to prevent over-estimation of prediction performance and avoid bias in our experiment results.ResultsIn our experiment, first we use a non-redundant linear B-cell epitope dataset curated by Sollner et al. for feature selection and parameter optimization. Evaluated by a three-way data split procedure, BEEPro achieves significant improvement with the area under the receiver operating curve (AUC) = 0.9987, accuracy = 99.29%, mathew's correlation coefficient (MCC) = 0.9281, sensitivity = 0.9604, specificity = 0.9946, positive predictive value (PPV) = 0.9042 for the Sollner dataset. In addition, the same parameters are used to evaluate performance on other independent linear B-cell epitope test datasets, BEEPro attains an AUC which ranges from 0.9874 to 0.9950 and an accuracy which ranges from 93.73% to 97.31%. Moreover, five-fold cross-validation on one benchmark conformational B-cell epitope dataset yields an accuracy of 92.14% and AUC of 0.9066.ConclusionsCompared with other current models, our method achieves a significant improvement with respect to AUC, accuracy, MCC, sensitivity, specificity, and PPV. Thus, we have shown that an appropriate combination of evolutionary information and propensity scales with a support vector machine model can significantly enhance the prediction performance of both linear and conformational B-cell epitopes.

Highlights

Development of computational tools that can accurately predict presence and location of B-cell epitopes on pathogenic proteins has a valuable application to the field of vaccinology
Prediction based on single propensity scale or position specific scoring matrix In general, the prediction performance of single propensity scale methods improves as the size of window increases, with the exception of the accessible surface area scale, which decreases as window size increases, and the polarity scale, which fluctuates across different window sizes (Figure 1, Additional File 8: Supplementary Table 1)
The amino acid ratio propensity scale (AUC = 0.6090) outperforms the four physicochemical scales regardless of window size, and this gives us confidence to use this scale for the later hybrid model

Summary

Introduction

Development of computational tools that can accurately predict presence and location of B-cell epitopes on pathogenic proteins has a valuable application to the field of vaccinology. Introduction The idea of using peptide-based vaccines to replace live or attenuated whole-pathogen vaccines has been an emerging field, as peptide-based vaccines can offer greater safety, potency, and elegance in drug design and delivery [1] The development of these peptide-based vaccines requires first. As development of vaccines is critical in our protection against infectious diseases, effective screening methods to identify immunogenic epitopes from the pathogenic proteome will be necessary. Classical methods such as phage display system have successfully yielded peptides that have proceeded to clinical trials, yet these experimental techniques are labour-intensive and may not reflect in vivo binding conditions or the biological ability to stimulate antibody production [2,3]. The shortcomings of current experimental methods call for the development of new computational models that can more effectively predict the presence and location of immunogenic (protective) epitopes given a pathogenic protein sequence

Methods

Results

Discussion

Conclusion