Abstract
The raw log messages record extremely rich system, network, and application running dynamic information that is a good data source for abnormal detection. Log template extraction is an important prerequisite for log sequence anomaly detection. The problems of the existing log template extraction methods are mostly offline, and the few online methods have insufficient F1-score in multi-source log data. In view of the shortcomings of the existing methods, an online log template extraction method called LogOHC is proposed. Firstly, the raw log messages are preprocessed, and the word distributed representation (word2vec) is used to vectorize the log messages online. Then, the online hierarchical clustering algorithm is applied, and finally, log templates are generated. The experimental analysis shows that LogOHC has a higher F1-score than the existing log template extraction methods, is suitable for multi-source log data sets, and has a shorter single-step execution time, which can meet the requirements of online real-time processing.
Highlights
The network environment is increasingly complex, and attacks against network applications and different systems are constantly emerging and are often combined with multiple attack methods
The main contributions and innovations of this paper are summarized as the following: (1) On the basis of data preprocessing, this paper vectorizes the log message online using the mean value word2vec algorithm to provide a high-quality data source for online hierarchical clustering. It applies the idea of natural language processing to log processing, which is not subject to the log format; (2) This paper proposes an online log template extraction method based on online hierarchical clustering, meeting the needs of online processing of log data; (3) Sufficient experiments have been conducted on three real-world log data sets
Our study shows that compared with state-of-the-art methods, LogOHC outperforms them in terms of effectiveness and has the superiority in efficiency and the sensitivity of the parameters
Summary
The network environment is increasingly complex, and attacks against network applications and different systems are constantly emerging and are often combined with multiple attack methods. These methods relied on the log format and most of them were offline, which did not meet the real-time requirements for log analysis In response to this problem, Du and Li [13] and He et al [14] proposed Spell and Drain for online log template extraction. (1) On the basis of data preprocessing, this paper vectorizes the log message online using the mean value word2vec algorithm to provide a high-quality data source for online hierarchical clustering It applies the idea of natural language processing to log processing, which is not subject to the log format;. (2) This paper proposes an online log template extraction method based on online hierarchical clustering, meeting the needs of online processing of log data;. The remainder of this paper is organized as follows: Section 2 provides preliminary and background, Section 3 presents the LogOHC log template extraction method, Section 4 conducts the experimental evaluation, and Section 5 summarizes the full paper and proposes further research directions
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: EURASIP Journal on Wireless Communications and Networking
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.