Detection of traffic anomalies in the information systems of organizations using Machine Learning methods on the base of algorithms for forecasting category fields

G I Haidur

doi:10.31673/2412-4338.2021.044153

G I Haidur

Open Access

PDF Available

https://doi.org/10.31673/2412-4338.2021.044153

Copy DOI

Export

Save

Cite

Abstract
Full-Text PDF
Similar Papers

Abstract

Listen

The article examines the problem of detecting anomalies in the network traffic of information systems of organizations. Detection of anomalies in network traffic will allow to determine the hidden malicious activity of the data obtained on the basis of protocols that collect statistical data of the network traffic of the information system. This, in turn, will allow you to reduce the load and configure the attributes that will be used to monitor and analyze network traffic. The authors proposed the network traffic anomaly detection architecture, which is divided into functional levels. Protocols were analyzed to collect statistics, namely the Net Flow/IPFIX protocol, which provides comprehensive information based on packet headers. To process and analyze the received data, the authors developed a model for detecting anomalies in the traffic of the information system. The anomaly detection model uses statistical data for their further processing, as well as the possibility of storing data in a repository. All received data is filtered to detect malicious processes, transferred and stored in the repository of the attack database with the possibility of creating warnings and identifying the attack. For the specified model, the use of Machine Learning based on methods of predicting categorical fields is proposed. The work used a dataset with firewall data, which contains information on the number and size of transmitted and received packets of packets, data on the use of malicious software. Using the method, an experimental study of the data was conducted to predict the presence of malicious software in them. The method of forecasting categorical fields using Logistic Regression, SVM, Random Forest Classifier and other classification algorithms was investigated. Based on the obtained data, a confusion matrix was built, which allows to estimate the error of the algorithms.

Full Text