Abstract

Presently, almost every computer software produces many log messages based on events and activities during the usage of the software. These files contain valuable runtime information that can be used in a variety of applications such as anomaly detection, error prediction, template mining, and so on. Usually, the generated log messages are raw, which means they have an unstructured format. This indicates that these messages have to be parsed before data mining models can be applied. After parsing, template miners can be applied on the data to retrieve the events occurring in the log file. These events are made from two parts, the template, which is the fixed part and is the same for all instances of the same event type, and the parameter part, which varies for all the instances. To decrease the size of the log messages, we use the mined templates to build a dictionary for the events, and only store the dictionary, the event ID, and the parameter list. We use six template miners to acquire the templates namely IPLoM, LenMa, LogMine, Spell, Drain, and MoLFI. In this paper, we evaluate the compression capacity of our dictionary method with the use of these algorithms. Since parameters could be sensitive information, we also encrypt the files after compression and measure the changes in file size. We also examine the speed of the log miner algorithms. Based on our experiments, LenMa has the best compression rate with an average of 67.4%; however, because of its high runtime, we would suggest the combination of our dictionary method with IPLoM and FFX, since it is the fastest of all methods, and it has a 57.7% compression rate.

Highlights

  • Creating logs is a common practice in programming, which is used to store runtime information of a software system

  • The authors of “Anomaly Detection from Log Files Using Data Mining Techniques” [1] proposed an anomaly-based approach using data mining of logs, and the overall error rates of their method were below 10%

  • We tested the compression efficiency on four distinct log message collections varying in size

Read more

Summary

Introduction

Creating logs is a common practice in programming, which is used to store runtime information of a software system. In “Categorical Feature Compression via Submodular Optimization” [22], the authors designed a vocabulary compression algorithm, a novel parametrization of mutual information objective, a data structure to query submodular functions and a distributed implementation They provided an analysis of simple alternative heuristic compression methods. In “On the Feasibility of Parser-based Log Compression in Large-Scale Cloud Systems” [24] the authors built LogReducer based on three techniques to compress numerical values in system logs: delta timestamps, correlation identification, and elastic encoding Their evaluation showed that it achieved high compression ratio on large logs, with comparable speed to the general-purpose compression algorithm. Various encrypting methods have been proposed [29,30]

Materials and Methods
Encryption Techniques
Blowfish
Results
Experimental Analysis
Experiment 1
Experiment 3
Discussion and Conclusions
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.