LogClass: Anomalous Log Identification and Classification With Partial Labels

Weibin Meng,Dan Pei,Shenglin Zhang,Yuheng Huang,Federico Zaiter,Ying Liu,Zhaoyang Yu,Yuzhi Zhang,Ming Zhang,Yuzhe Zhang,Lei Song

doi:10.1109/tnsm.2021.3055425

Abstract

Logs are imperative in the management process of networks and services. However, manually identifying and classifying anomalous logs is time-consuming, error-prone, and labor-intensive. Additionally, rule-based approaches cannot tackle the challenges underlying anomalous log identification and classification resulting from new types of logs and partial labels. We propose LogClass, a framework to automatically and robustly identify and classify anomalous logs for network and service based on partial labels . LogClass combines a word representation method, a positive and unlabeled learning (PU learning) model, and a machine learning classifier. Besides, we propose a novel Inverse Location Frequency (ILF) method to weight the words of logs in feature construction properly. We evaluate the performance of LogClass based on 18 million+ real-world switch logs and six public log datasets. It achieves 99.56% and 98% F1 scores in anomalous log identification on switch logs and publicly available supercomputer logs, respectively, and very-close-to-one F1 score in anomalous log classification. Moreover, we have conducted extensive experiments to demonstrate LogClass’ superior performance in addressing partial labels and new types of logs.

Full Text