Abstract

B-cells that induce antigen-specific immune responses in vivo produce large numbers of antigen-specific antibodies by recognizing subregions (epitopes) of antigenic proteins, in which they can inhibit the function of antigen protein. Epitope region prediction facilitates the design and development of vaccines that induce the production of antigen-specific antibodies. There are many diseases which are difficult to treat without vaccines. And the COVID-19 has destroyed many people’s lives. Therefore, making vaccines to COVID-19 is very important. Making vaccines needs a large number of experiments to get labeled targets. However, obtaining tremendous labeled data from experiments is a challenge for humans. Big data analysis has proposed some solutions to deal with this challenge. Big data technology has developed very fast and has been applied in many areas. In the bioinformatics area, big data analysis solves a large number of problems, particularly in the area of active learning. Active learning is a method of building more predictive models with less labeled data. Active learning establishes models with less data by asking the oracle (human) for the most valuable samples to train models. Hence, active learning’s application in making vaccines is meaningful that the scientists do not need to do tremendous experiments. This paper proposed a more robust active learning method based on uncertainty sampling and K-nearest density and applies it to the vaccine manufacture. This paper evaluates the new algorithm with accuracy and robustness. In order to evaluate the robustness of active learners, a new robustness index is designed in this paper. And this paper compares the new algorithm with a pool-based active learning algorithm, density-weighted active learning algorithm, and traditional machine learning algorithm. Finally, the new algorithm is applied to epitope prediction of B-cell data, which is significant to making vaccines.

Highlights

  • Big data analysis is a thriving field

  • Since it is not difficult to get the specimen of B-cell, using the epitope prediction to the health collaborative systems is a good way to assess whether people are suffering from COVID-19

  • The new algorithm can reduce the complexity of density-weighted pool-based active learners like SUD when facing the big data

Read more

Summary

Introduction

Big data analysis is a thriving field. The branch of big data analysis, artificial intelligence, has greatly promoted the team’s understanding of life science in the field of bioinformatics [1, 2]. Outliers are not so valuable and may result in less robust classifiers when new samples are added to the training data To solve this problem, densityweighted sampling has been proposed. Some methods have developed new loss functions by integrating uncertainty sampling and K-nearest density weighting to improve the performance of active learning [29,30,31]. Our proposal is to make a new algorithm which predicts the epitope with less labeled data and higher accuracy when compared to the existed pool-based active learning and density-weighted active learning algorithms in epitope prediction problem. By experimenting and comparing the KRAL with pool-based active learning and densityweighted method on B-cell data, we get a more accurate and robust model with less complexity. The results of this study may be helpful in the production of the COVID-19 vaccine

Data and Methodology
Active Learning Process
Uncertainty Measures
K-Nearest Robust Active Learning
Experiments and comparing
Result and Analysis
Findings
Conclusion and Discussion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call