With the development of information systems, they have become increasingly large and complex and have generated a large amount of log information. These log information records the system's health, but the number of log information is huge, and the traditional exception detection algorithm in the case of very large amounts of data is difficult to efficiently and accurately detect anomalies due to poor generalization performance. A BERT-based log anomaly detection algorithm, LADB, is proposed, essentially a semi-supervised, multi-classification algorithm. (1) LADB uses the Transformer encoder as the base component for problems such as feature degradation and gradient explosion. (2) In order to make better use of the bidirectional context, and in view of BERT's excellence in the NLP field, the Masking Log Key Prediction (MLKP) self-monitoring task was designed, drawing on the idea of BERT's Masking Language Model. (3) In order to solve the problem of difficult and slow processing of high-dimensional data, the Deep SVDD algorithm is used for minimum superspheres capacity (VHM) self-supervision training task. Experiments have shown that LADB's combined performance is superior to the four representative log anomaly detection algorithms.
Read full abstract