This paper delves into the critical need for enhanced security measures within the Internet of Things (IoT) landscape due to inherent vulnerabilities in IoT devices, rendering them susceptible to various forms of cyber-attacks. The study emphasizes the importance of Intrusion Detection Systems (IDS) for continuous threat monitoring. The objective of this study was to conduct a comprehensive evaluation of feature selection (FS) methods using various machine learning (ML) techniques for classifying traffic flows within datasets containing intrusions in IoT environments. An extensive benchmark analysis of ML techniques and FS methods was performed, assessing feature selection under different approaches including Filter Feature Ranking (FFR), Filter-Feature Subset Selection (FSS), and Wrapper-based Feature Selection (WFS). FS becomes pivotal in handling vast IoT data by reducing irrelevant attributes, addressing the curse of dimensionality, enhancing model interpretability, and optimizing resources in devices with limited capacity. Key findings indicate the outperformance for traffic flows classification of certain tree-based algorithms, such as J48 or PART, against other machine learning techniques (naive Bayes, multi-layer perceptron, logistic, adaptive boosting or k-Nearest Neighbors), showcasing a good balance between performance and execution time. FS methods' advantages and drawbacks are discussed, highlighting the main differences in results obtained among different FS approaches. Filter-feature Subset Selection (FSS) approaches such as CFS could be more suitable than Filter Feature Ranking (FFR), which may select correlated attributes, or than Wrapper-based Feature Selection (WFS) methods, which may tailor attribute subsets for specific ML techniques and have lengthy execution times. In any case, reducing attributes via FS has allowed optimization of classification without compromising accuracy. In this study, F1 score classification results above 0.99, along with a reduction of over 60% in the number of attributes, have been achieved in most experiments conducted across four datasets, both in binary and multiclass modes. This work emphasizes the importance of a balanced attribute selection process, taking into account threat detection capabilities and computational complexity.
Read full abstract