The Use of Template Miners and Encryption in Log Message Compression

Péter Marjai,Attila Kiss,Péter Lehotay-Kéry

doi:10.3390/computers10070083

Péter Marjai, Attila Kiss + Show 1 more

Open Access

https://doi.org/10.3390/computers10070083

Copy DOI

Journal: Computers	Publication Date: Jun 23, 2021
Citations: 3	License type: CC BY 4.0

Affiliation: Eötvös Loránd University, Selye János University

Abstract

Presently, almost every computer software produces many log messages based on events and activities during the usage of the software. These files contain valuable runtime information that can be used in a variety of applications such as anomaly detection, error prediction, template mining, and so on. Usually, the generated log messages are raw, which means they have an unstructured format. This indicates that these messages have to be parsed before data mining models can be applied. After parsing, template miners can be applied on the data to retrieve the events occurring in the log file. These events are made from two parts, the template, which is the fixed part and is the same for all instances of the same event type, and the parameter part, which varies for all the instances. To decrease the size of the log messages, we use the mined templates to build a dictionary for the events, and only store the dictionary, the event ID, and the parameter list. We use six template miners to acquire the templates namely IPLoM, LenMa, LogMine, Spell, Drain, and MoLFI. In this paper, we evaluate the compression capacity of our dictionary method with the use of these algorithms. Since parameters could be sensitive information, we also encrypt the files after compression and measure the changes in file size. We also examine the speed of the log miner algorithms. Based on our experiments, LenMa has the best compression rate with an average of 67.4%; however, because of its high runtime, we would suggest the combination of our dictionary method with IPLoM and FFX, since it is the fastest of all methods, and it has a 57.7% compression rate.

Highlights

Creating logs is a common practice in programming, which is used to store runtime information of a software system
The authors of “Anomaly Detection from Log Files Using Data Mining Techniques” [1] proposed an anomaly-based approach using data mining of logs, and the overall error rates of their method were below 10%
We tested the compression efficiency on four distinct log message collections varying in size

Summary

Introduction

Creating logs is a common practice in programming, which is used to store runtime information of a software system. In “Categorical Feature Compression via Submodular Optimization” [22], the authors designed a vocabulary compression algorithm, a novel parametrization of mutual information objective, a data structure to query submodular functions and a distributed implementation They provided an analysis of simple alternative heuristic compression methods. In “On the Feasibility of Parser-based Log Compression in Large-Scale Cloud Systems” [24] the authors built LogReducer based on three techniques to compress numerical values in system logs: delta timestamps, correlation identification, and elastic encoding Their evaluation showed that it achieved high compression ratio on large logs, with comparable speed to the general-purpose compression algorithm. Various encrypting methods have been proposed [29,30]

Materials and Methods

Encryption Techniques

Blowfish

Results

Experimental Analysis

Experiment 1

Experiment 3

Discussion and Conclusions

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

The Use of Template Miners and Encryption in Log Message Compression

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Computers

Lead the way for us

Similar Papers

A Novel Dictionary-Based Method to Compress Log Files with Different Message Frequency Distributions
Péter Marjai ... Péter Lehotay-Kéry
Applied Sciences | VOL. 12
Péter Marjai, et. al.Péter Marjai ... Péter Lehotay-Kéry
16 Feb 2022
Applied Sciences | VOL. 12

A search-based approach for accurate identification of log message formats
Salma Messaoudi ... Lionel Briand
-
Salma Messaoudi, et. al.Salma Messaoudi ... Lionel Briand
28 May 2018
28 May 2018

A Parallel Approach of Weighted Edit Distance Calculation for Log Parsing
Xingyuan Ren ... Kunpeng Xie
-
Xingyuan Ren, et. al.Xingyuan Ren ... Kunpeng Xie
01 Aug 2019
01 Aug 2019

Abstracting log lines to log event types for mining software system logs
Meiyappan Nagappan ... Mladen A Vouk
-
Meiyappan Nagappan, et. al.Meiyappan Nagappan ... Mladen A Vouk
01 May 2010
01 May 2010

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

The Use of Template Miners and Encryption in Log Message Compression

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Computers