Multi Perspective Robust Deep Analysis Cyber Anomaly Detection on HDFS Log using Transformers

Yogaraja G S R Yogaraja G S R,Gowtham K A Gowtham K A,Deepak M Deepak M,Kiran S Kiran S,Laxmipathi R Laxmipathi R

doi:10.48175/ijarsct-9738

Abstract

Log analysis is one of the main techniques engineers use to troubleshoot faults of large- scale software systems. During the past decades, many log analysis approaches have been proposed to detect system anomalies reflected by logs. They usually take log event counts or sequential log events as inputs and utilize machine learning algorithms including deep learning models to detect system anomalies. These anomalies are often identified as violations of quantitative relational patterns or sequential patterns of log events in log sequences. While these systems provide users rich services, they also bring new security and reliability challenges. One of the challenges is locating system faults and discovering potential issues. While anomaly detection has been widely studied in the context of network data, operational data presents several new challenges, including the volatility and sparseness of data, and the need to perform fast detection (complicating application of schemes that require offline processing or large/stable data sets to converge). This paper proposes a log sequence anomaly detection method based on neural network training and feature extraction. This method uses BERT (Bidirectional Encoder Representations from Transformers) to extract the semantic features and statistical features of the log sequence.

Full Text