Abstract
Log keyword extraction is an indispensable part of log anomaly detection. There are two main challenges in keyword extraction, one is that the essence of logs is unstructured, and different vendors usually define different log formats, the other one is that the most of the traditional method cannot update the log keywords incrementally to match the newly generated log data, so the extraction accuracy is low. To solve these problems, we introduce an online incremental keyword extraction method OILog. The essential idea of this method is that log templates are usually the longest combination of high-frequency words. OILog builds models by using a deep Long Short-Term Memory network (LSTM) for capturing both high-frequency log keywords in real-time and new log keywords generated by the system, which can transform unstructured raw logs into structured logs quickly. To improve the efficiency and accuracy of the model, we proposed an improved particle swarm optimization algorithm, which changes the traditional topology structure of Particle Swarm Optimization algorithm (PSO) into a multilayer structure and applies a new particle velocity update formula to increase the attraction between particles. We summarized the previous works and validated OILog using real log data collected from four systems. The results show that OILog has superiority in terms of both accuracy and robustness.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have