Application of Machine Learning Algorithms for Prediction of Tumor T-Cell Immunogens

Stanislav Sotirov,Ivan Dimitrov

doi:10.3390/app14104034

Abstract

The identification and characterization of immunogenic tumor antigens are essential for cancer vaccine development. In light of the impracticality of isolating and evaluating each putative antigen individually, in silico prediction algorithms, particularly those utilizing machine learning (ML) approaches, play a pivotal role. These algorithms significantly reduce the experimental workload necessary for discovering vaccine candidates. In this study, we employed six supervised ML methods on a dataset comprising 212 experimentally validated human tumor peptide antigens and an equal number of non-antigenic human peptides to develop models for immunogenicity prediction. These methods encompassed k-nearest neighbor (kNN), linear discriminant analysis (LDA), quadratic discriminant analysis (QDA), support vector machine (SVM), random forest (RF), and extreme gradient boosting (XGBoost). The models underwent validation through internal cross-validation within 10 groups from the training set and were further assessed using an external test set. Remarkably, the kNN model demonstrated superior performance, recognizing 90% of the known immunogens in the test set. The RF model excelled in the identification of non-immunogens, accurately classifying 93% of them in the test set. The three top-performing ML models according to multiple evaluation metrics (SVM, RF, and XGBoost) are to be subsequently integrated into the new version of the VaxiJen server, facilitating tumor antigen prediction through a majority voting mechanism.

Full Text