Abstract

In many practical applications, due to the high cost of data annotation, the training dataset includes a large number of unlabeled samples and a small number of labeled samples. At the same time, there are a large number of normal behavior data and a small number of intrusion data in the network data. In order to solve this problem, this paper proposes a semi-supervised ensemble learning algorithm for imbalanced data. This algorithm uses the relationship between class samples to define the sampling probability of samples, and then constructs the initial training subset and the base classifier according to the sampling probability. Then, the evaluation index for imbalanced data is defined to evaluate and select base classifiers. Then the weighted voting method is used to integrate the selected base classifier. Finally, the simulation results of UCI data set and NSL-KDD data set show that the algorithm can improve the detection accuracy, especially the recognition rate of unknown intrusion behavior.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call