Improvement of K-nearest Neighbors (KNN) Algorithm for Network Intrusion Detection Using Shannon-Entropy

Nguyen Gia Bach,Tran Hoang Hai,Le Huy Hoang

doi:10.12720/jcm.16.8.347-354

Abstract

Non-parametric Nearest Neighbor is an algorithm seeking for the closest data points based on the Euclidean Norm (the standard distance between two data points in a multidimensional space). The classical K-nearest Neighbor (KNN) algorithm applies this theory to find K data points in a vicinity of the considering data, then uses majority voting to label its category. This paper proposes a modification to the original KNN to improve its accuracy by changing that Euclidean Norm based on Shannon-Entropy theory in the context of Network Intrusion Detecton System. Shannon-Entropy calculates the importance of features based on the labels of those data points, then the distance between data points would be re-calculated through the new weights found for these features. Therefore, it is possible to find the more suitable K data points nearby. NSL - KDD dataset is used in this paper to evaluate the performance of the proposed model. A comparison is drawn between the results of the classic KNN, related work on its improvement and the proposed algorithm as well as novel deep learning approaches to evaluate its effectivenes in different scenarios. Results reveal that the proposed algorithm shows good performance on NSL - KDD data set. Specifically, an accuracy up to 99.73% detecting DoS attacks is obtained, 5.46% higher than the original KNN, and 1.15% higher than the related work of M-KNN. Recalculating the Euclidean-Norm distance retains the contribution of the features with low importance to the data classification, while assuring that features with higher importance will have a higher impact. Thus, the proposal does not raise any concern for losing information, and even achieves high efficiency in the classification of features and data classification.

Full Text