Abstract

Network intrusion behaviour data is the imbalanced data. It includes a large amount of normal behavior data and a small amount of intrusion behavior data. The traditional selective ensemble learning algorithm will lead to high false negative rate. This paper proposes a selective ensemble learning algorithm for imbalanced data based on under sampling (SELAUS). First of all, the algorithm uses Bootstrap method to extract samples equal to the number of samples of a few classes from majority class samples to construct multiple balanced training subsets. Then, in order to ensure that the obtained base classifiers have great differences, several features are randomly selected on the training subset and a decision tree is constructed as the base classifier using CART algorithm. This method can also make some base classifiers have poor performance, so it can select and integrate base classifiers instead of all base classifiers. In order to accurately evaluate the generalization error of the classifier for imbalanced dataset, this paper defines the performance evaluation method in the imbalanced dataset and the difference evaluation method between the base classifiers. Then the generalization error of each base classifier is calculated, and the base classifier is selected according to the generalization error. In the integration of weighted voting, the weight of each base classifier is calculated by the weight calculation method for imbalanced data. Finally, the validity of the algorithm is verified by UCI dataset and applied to network intrusion detection. The simulation results show that the algorithm can improve the detection rate of minority class samples, that is to say, reduce the false negative rate.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.