Abstract

With the gradual integration of Internet technology and modern industrial production, the network makes the production more intelligent and efficient at the same time, but also makes the industrial Internet face more security threats. Therefore, the protection of network information security, timely discovery and processing of abnormal access to data, to ensure the safe and robust operation of the industrial Internet is essential. Intrusion detection system, as a network security defense tool, can quickly detect and identify malicious intrusion and make emergency response. In the network intrusion detection based on data mining, the data that the intrusion detection system needs to deal with exists in the form of static data set or dynamic data stream. Intrusion detection for static data sets is easy to result in poor performance of data mining algorithm due to data redundancy, and consume a lot of computing and storage resources. Intrusion Detection (IDS) for dynamic data flow may cause the data mining model can not adapt to the dynamic change of data flow because of the limited observation samples. In order to solve the problem that data redundancy affects the effect of data mining algorithm in intrusion detection for static data sets, a data reduction method based on tree model is proposed in this paper. As a data multiprocessing method, this method combined with subgroup discovery technology to filter the data set, reduce the size of the data set, reasonably divide the data set, so as to reduce the computational cost of subsequent data mining algorithms. The experimental results of multiple data sets show that this method can effectively reduce the size of data sets. Combined with the decision tree classification algorithm, the experimental results of KDDCUP1999 intrusion detection data set show that the data set after data reduction can build a compact and smaller decision tree, and effectively improve the efficiency of decision tree classification on the basis of guaranteeing the classification accuracy. In order to solve the problem that the data mining model based on finite samples can not adapt to the data changes in the intrusion detection for dynamic data flow, this paper improves and proposes a fast decision tree classification algorithm based on probability estimation. Based on the framework of Very Fast Decision Tree (VFDT), the algorithm combined with two probabilistic correction methods, Laplace smoothing and Wilson interval mean estimation, and adjusted the attribute test conditions to select the best split attribute. The experimental results of NSL-KDD intrusion detection data set show that the improved algorithm can obtain a compact and smaller fast decision tree model, and improve the adaptability of the model to the evolution of data flow while ensuring the prediction ability of the model to data flow.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call