The Application of Deep Learning Imputation and Other Advanced Methods for Handling Missing Values in Network Intrusion Detection

Mateusz Szczepański,Marek Pawlicki,Michał Choraś,Rafał Kozik

doi:10.1142/s2196888822500257

Abstract

In intelligent information systems data play a critical role. The issue of missing data is one of the commonplace problems occurring in data collected in the real world. The problem stems directly from the very nature of data collection. In this paper, the notion of handling missing values in a real-world application of computational intelligence is considered. Two experimental campaigns were conducted, evaluating different approaches to the missing values imputation on Random Forest-based classifiers, trained using modern cybersecurity benchmarks datasets: CICIDS2017 and IoT-23. In result of the experiments it transpired that the chosen algorithm for data imputation has a severe impact on the results of the classifier used for network intrusion detection. It also comes to light that one of the most popular approaches to handling missing data — complete case analysis — should never be used in cybersecurity.

Full Text