Abstract

Identification of B-cell epitopes (BCEs) is a fundamental step for epitope-based vaccine development, antibody production, and disease prevention and diagnosis. Due to the avalanche of protein sequence data discovered in postgenomic age, it is essential to develop an automated computational method to enable fast and accurate identification of novel BCEs within vast number of candidate proteins and peptides. Although several computational methods have been developed, their accuracy is unreliable. Thus, developing a reliable model with significant prediction improvements is highly desirable. In this study, we first constructed a non-redundant data set of 5,550 experimentally validated BCEs and 6,893 non-BCEs from the Immune Epitope Database. We then developed a novel ensemble learning framework for improved linear BCE predictor called iBCE-EL, a fusion of two independent predictors, namely, extremely randomized tree (ERT) and gradient boosting (GB) classifiers, which, respectively, uses a combination of physicochemical properties (PCP) and amino acid composition and a combination of dipeptide and PCP as input features. Cross-validation analysis on a benchmarking data set showed that iBCE-EL performed better than individual classifiers (ERT and GB), with a Matthews correlation coefficient (MCC) of 0.454. Furthermore, we evaluated the performance of iBCE-EL on the independent data set. Results show that iBCE-EL significantly outperformed the state-of-the-art method with an MCC of 0.463. To the best of our knowledge, iBCE-EL is the first ensemble method for linear BCEs prediction. iBCE-EL was implemented in a web-based platform, which is available at http://thegleelab.org/iBCE-EL. iBCE-EL contains two prediction modes. The first one identifying peptide sequences as BCEs or non-BCEs, while later one is aimed at providing users with the option of mining potential BCEs from protein sequences.

Highlights

  • The humoral immune system is a complex network of cells that work together to protect the body against foreign substances or antigens such as bacteria, viruses, fungi, parasites, and cancerous cells

  • We investigated six different machine learning (ML) algorithms [support vector machine (SVM), random forest (RF), extremely randomized tree (ERT), AdaBoost (AB), gradient boosting (GB), and k-nearest neighbors (k-NN)], five compositions [amino acid composition (AAC), amino acid index (AAI), dipeptide composition (DPC), chain-transition-distribution (CTD), and physicochemical properties (PCP)], 23 hybrid features, and six binary profiles (BPF)

  • We explored six different ML algorithms, including SVM, RF, ERT, GB, AB, and k-NN, using five different encoding schemes (AAC, AAI, CTD, DPC, and PCP) and their combinations (17 hybrid features), which included H1 (AAC + AAI); H2 (AAC + DPC + AAI); H3 (AAC + DPC + AAI + CTD); H4 (AAC + DPC + AAI + CTD + PCP); H5 (AAC + DPC); H6 (AAC + CTD); H7 (AAC + PCP); H8 (AAI + DPC); H9 (AAI + DPC + CTD); H10 (AAI + DPC + CTD + PCP); H11 (AAI + CTD); H12 (AAI + PCP); H13 (DPC + CTD); H14 (DPC + CTD + PCP); H15 (DPC + PCP); H16 (CTD + DPC); and H17 (AAC + AAI + PCP)

Read more

Summary

Introduction

The humoral immune system is a complex network of cells that work together to protect the body against foreign substances or antigens such as bacteria, viruses, fungi, parasites, and cancerous cells. Antigens are larger in size, only certain parts of antigenic determinants, called B-cell epitopes (BCEs), are recognized by specific receptors on the B-cell surface, genera­ting soluble forms of antigen-specific antibodies [1]. Experimental methods to identify BCEs include X-ray crystallography, cryo-EM, nuclear mag­ne­ tic resonance, hydrogen–deuterium exchange coupled to mass spectroscopy, peptide-based approaches, mutagenesis, and antigen fragmentation [5, 10]. These methods could be expensive and time-consuming. LBtope is the most recently developed one and is publicly available

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.