Optimizing feature selection in intrusion detection systems: Pareto dominance set approaches with mutual information and linear correlation

Guilherme Nunes Nasseh Barbosa,Martin Andreoni,Diogo Menezes Ferrazani Mattos

doi:10.1016/j.adhoc.2024.103485

Abstract

In the realm of network intrusion detection using machine learning, feature selection aims for computational efficiency, enhanced performance, and model interpretability, preventing overfitting and optimizing data visualization. This paper proposes a filtering method for feature selection, which optimizes information quantity and linear correlation between resultant features. The method identifies Pareto dominant pairs of informative and correlated features, constructs a graph, and selects key features based on betweenness centrality in its connected components. The proposal yields a more concise and informative dataset representation. Experimental results, using three diverse datasets, demonstrate that the proposal achieves more than 95% accuracy in classifying network attacks with just 14% of the total number features in original datasets.

Full Text