Влияние проблемы многозначности меток классов системных журналов на защищенность компьютерных сетей

Dmitriy I Rakovskiy

doi:10.36724/2409-5419-2023-15-1-48-56

Abstract

The security of information circulating in a computer network is related to the security of the supporting infrastructure. An important problem in the intelligent processing of syslog data is the existence of multi-label datasets. Among the Russian-language scientific publications, the problem under consideration in the context of information security of computer networks is not presented. Purpose: increase the security of computer networks by using multi-label learning methods when solving the problem of classifying system logs class labels. Results: A comparative analysis of single-valued and multi-label classifiers was carried out in a computational experiment on the Mean accuracy metric. A non-linear relationship was found between the proportion of experimental data sections containing multi-label class labels and the overall accuracy of data classification. Despite the fact that multilabel plots in the studied experimental data are only 3%, the gain in accuracy reaches 23% according to the specified metric. According to the results of the analysis, 80% of unambiguous classifiers were inferior in classification accuracy according to the Mean accuracy multi-label metric to their analogues, which may signal a strong influence of multi-label class labels on the models under consideration. It is shown that the considered structure of experimental data in a tabular form is affected by the multi-label problem much more strongly than it can be estimated by a standard frequency check, which actualizes further research in this direction. Practical relevance: The practical significance of the results obtained lies in increasing the security of computer networks through the use of a multi-label approach in the classification problem. The tasks of information security solved by multi-label classification may include: the area of monitoring, detection or prevention of violations and computer attacks in computer networks. Discussion: Since the predictive power of frequency testing of the influence of multi-label class label results on the classification results of unambiguous classifiers is low, further research on this topic is planned. It is planned to expand the list of classification quality assessment metrics in future experiments.

Full Text