Abstract

This article is devoted to development of an algorithm for automated analysis and transformation of a log message into a list of features in the form of a fixed-length vector and accumulation of the obtained vectors into a single dataset. The resulted dataset is proposed to be used in machine learning based anomaly detection systems. An additional requirement for the algorithm being developed is the diversity of protocols used to collect log messages in a computer system. These goals were achieved by develop of the software package. The software package collect and parse data from log messages in order to isolate and encode the features from log messages. The software package is enable to collect log messages by several protocols: syslog, SNMP, SQL, reading text and binary files. The data extracted from the log messages of the computing system is considered. The support of LUA scripts for data enrichment is applied. The list of features is generated. The method to encode text data extracted from log messages is proposed. The transformation algorithm of an arbitrary log message into a features vector of fixed dimension is proposed. A methodology for the formation of a dataset for subsequent use in machine learning of the anomaly detection system in a computing system is provided. An example of a dataset storage structure is given.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call