Abstract

Log anomaly detection is an efficient method to manage modern large-scale Internet of Things (IoT) systems. More and more works start to apply natural language processing (NLP) methods, and in particular word2vec, in the log feature extraction. Word2vec can extract the relevance between words and vectorize the words. However, the computing cost of training word2vec is high. Anomalies in logs are dependent on not only an individual log message but also on the log message sequence. Therefore, the vector of words from word2vec can not be used directly, which needs to be transformed into the vector of log events and further transformed into the vector of log sequences. To reduce computational cost and avoid multiple transformations, in this paper, we propose an offline feature extraction model, named LogEvent2vec, which takes the log event as input of word2vec to extract the relevance between log events and vectorize log events directly. LogEvent2vec can work with any coordinate transformation methods and anomaly detection models. After getting the log event vector, we transform log event vector to log sequence vector by bary or tf-idf and three kinds of supervised models (Random Forests, Naive Bayes, and Neural Networks) are trained to detect the anomalies. We have conducted extensive experiments on a real public log dataset from BlueGene/L (BGL). The experimental results demonstrate that LogEvent2vec can significantly reduce computational time by 30 times and improve accuracy, comparing with word2vec. LogEvent2vec with bary and Random Forest can achieve the best F1-score and LogEvent2vec with tf-idf and Naive Bayes needs the least computational time.

Highlights

  • Internet of Things (IoT) [1,2] has provided the possibility of deploying tiny, cheap, available, and durable devices, which are able to collect various data in real time, with continuous supply [3,4,5,6,7].IoT devices are vulnerable and usually deployed in harsh and extreme natural environments, solutions that can improve monitoring services and the security of IoT devices are needed [8,9,10]

  • LogEvent2vec can work with any coordinate transformation methods and anomaly detection models

  • We show that our feature extraction algorithm can work well with various anomaly detection methods

Read more

Summary

Introduction

Internet of Things (IoT) [1,2] has provided the possibility of deploying tiny, cheap, available, and durable devices, which are able to collect various data in real time, with continuous supply [3,4,5,6,7].IoT devices are vulnerable and usually deployed in harsh and extreme natural environments, solutions that can improve monitoring services and the security of IoT devices are needed [8,9,10]. Sensors 2020, 20, 2451 the states and events of the devices and systems, providing a valuable source of information which can be exploited both for research and industrial purposes. The reason is that a large amount of log data stored in such devices can be analyzed to observe user behavior patterns or detect errors in the system. Better IoT solutions can be developed or updated and presented to the user [11]. Logs are one of the most valuable data sources for device management, root cause analysis, and IoT solutions updating. Log anomaly detection is a part of log analysis that analyzes the log messages to detect the anomalous state caused by sensor hardware failure, energy exhaustion, or the environment [13]

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.