Abstract
As the first and foremost step of typical automatic log analysis, log parsing has attracted a lot of interest. Most of existing studies treat log messages as pure strings and rely on string matching or string distance. In NLP, word2vec has shown very efficient and effective in representing words with low dimensional vectors. Inspired by this, in this paper we propose a novel method, called LPV (Log Parser based on Vectorization), for both offline and online log parsing. The central idea of our method in offline log parsing is to first convert log messages into vectors, and measure the similarity between two log messages by the distance between two vectors, then log messages can be clustered via clustering the vectors, and log templates can be extracted from the resulting clusters. For online log parsing, we also assign log templates with some kind of average vectors, so that the similarity between an incoming log message and each log template can also be measured by the distance between two vectors. We have conducted extensive experiments based on three widely used log datasets, and the results demonstrate that our proposed method LPV can achieve a competitive performance, compared against state-of-the-art log parsing methods.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.