Automated Log Analysis and Anomaly Detection Using Machine Learning

Ali Hussain Shah,Esmaeil Habib Zadeh,Daem Pasha,Savas Konur

doi:10.3233/faia220378

Abstract

Reducing the number of alerts and anomalies has been the focus of several studies, but an automated anomaly detection using log files is still an ongoing challenge. One of the pertinent challenges in the detection of anomalies using log files is dealing with ‘unlabelled’ data. In the existing approaches, there is a lack of anomalous examples and that log anomalies can have many different patterns. One solution is to label the data manually, but this can be a tedious task as the data size could be very large and log files are not easily understandable. In this paper, we have presented an automated anomaly detection model that combines supervised and unsupervised machine learning with domain knowledge. Our method reduces the number of alerts by accurately predicting anomalous log events based on domain expertise, which is used to create automated rules that allow generating a labelled dataset from unlabelled log records, which are unstructured and present in many different formats. This labelled dataset is then used to train a classification model that will help predict anomalous log events. Our results show that we can accurately predict anomalous and non-anomalous events with an average accuracy of 98%. Our approach offers a practical solution for systems where logs are collected without any labelling, making it difficult to create an accurate model to identify anomalous log records. The methodology presented is very fast and efficient, which can provide real-time anomaly detection for time critical environments.

Full Text