Feature Selection and Model Evaluation for Threat Detection in Smart Grids

Mikołaj Gwiazdowicz,Marek Natkaniec

doi:10.3390/en16124632

Abstract

The rising interest in the security of network infrastructure, including edge devices, the Internet of Things, and smart grids, has led to the development of numerous machine learning-based approaches that promise improvement to existing threat detection solutions. Among the popular methods to ensuring cybersecurity is the use of data science techniques and big data to analyse online threats and current trends. One important factor is that these techniques can identify trends, attacks, and events that are invisible or not easily detectable even to a network administrator. The goal of this paper is to suggest the optimal method for feature selection and to find the most suitable method to compare results between different studies in the context of imbalance datasets and threat detection in ICT. Furthermore, as part of this paper, the authors present the state of the data science discipline in the context of the ICT industry, in particular, its applications and the most frequently employed methods of data analysis. Based on these observations, the most common errors and shortcomings in adopting best practices in data analysis have been identified. The improper usage of imbalanced datasets is one of the most frequently occurring issues. This characteristic of data is an indispensable aspect in the case of the detection of infrequent events. The authors suggest several solutions that should be taken into account while conducting further studies related to the analysis of threats and trends in smart grids.

Full Text