Abstract

A conformational epitope is a part of a protein-based vaccine. It is challenging to identify using an experiment. A computational model is developed to support identification. However, the imbalance class is one of the constraints to achieving optimal performance on the conformational epitope B cell prediction. In this paper, we compare several conformational epitope B cell prediction models from non-ensemble and ensemble approaches. A sampling method from Random undersampling, SMOTE, and cluster-based undersampling is combined with a decision tree or SVM to build a non-ensemble model. A random forest model and several variants of the bagging method is used to construct the ensemble model. A 10-fold cross-validation method is used to validate the model. The experiment results show that the combination of the cluster-based under-sampling and decision tree outperformed the other sampling method when combined with the non-ensemble and the ensemble method. This study provides a baseline to improve existing models for dealing with the class imbalance in the conformational epitope prediction.

Highlights

  • The development of computational methods for epitope prediction is an active research area for more than 30 years

  • Each class's performance is stated in the True Positive Rate (TPR) and True Negative Rate (TNR)

  • The overall model performance is expressed by Area Under the Curve (AUC), Geometric mean (Gmean), Adjusted Graph, and F-score

Read more

Summary

Introduction

The development of computational methods for epitope prediction is an active research area for more than 30 years. The conformational epitope's prediction model was started by CEP, which utilizes solvent accessibility properties [1]. CN, HSE), and statistics (log odd ratio), which have been implemented to improve the performance of the model. Still, according to [7], in the ensemble approach, the Bagging Method is superior to other methods such as Boosting and cost-sensitive. Some sampling approaches have been implemented in the conformational epitope's predictive models [2,3], [5]. The other approach is cost-sensitive method [6]. The cost sensitive is superior compared to several ensemble methods, both boosting and hybrid between boosting and bagging (Easy ensemble and Balance Cascade [8]). Performance of modified bagging model in conformational epitope prediction is unknown. Study of [9] show that bagging extention based can improve the model’s performance in class imbalace problems

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call