A Heuristic Approach Using Template Miners for Error Prediction in Telecommunication Networks

Peter Marjai,Attila Kiss,Peter Lehotay-Kery

doi:10.1109/access.2022.3221427

Peter Marjai, Attila Kiss + Show 1 more

Open Access

https://doi.org/10.1109/access.2022.3221427

Copy DOI

Abstract

With the appearance of large-scale systems, the size of the generated logs increased rapidly. Almost every software produces such files. Log files contain runtime information of the software, as well as indicate noteworthy events or suspicious behaviors like errors. To understand and monitor the operation of the system, log files are a valuable source of information, which can be used to predict upcoming anomalies. In recent years numerous techniques have been proposed for this purpose. There are supervised models like SVM or decision trees and also unsupervised ones like Isolation Forest, Log Clustering, or PCA. There are also methods that use deep learning, like Autoencoder, CNN, LSTM, or Transformer. Many of the above-mentioned methods take advantage of template miners, that extract the event types from the unstructured data. In our paper, we propose a method that uses these templates to predict upcoming anomalies. We use 80% of our data for training, and 20% for tests. First, we use half of the train data and sort the templates that have an occurrence that is followed by an error to create a list of candidate templates. In the second step, we use the other half, to check how often the ten upcoming lines after a candidate template actually contain an anomaly. If a given percentage is reached, we consider the template as an indicator for upcoming anomalies. We conduct various experiments to verify the capability of our method like measuring the precision, recall, f-score accuracy, and speed on various data sources. The proposed method slightly falls behind SVM and CNN with an average of 88.06% precision, 90.43% recall, and 89.11% f-score, however, it has better accuracy with 98.19%. In addition, our algorithm is two times faster than SVM and three and a half times faster than CNN.

Full Text