Abstract

Data generated by real-world systems specifically cyber-physical systems is full of noise, packet loss, and other imperfections. However, most intrusion detection, anomaly de-tection, monitoring, and mining algorithms and frameworks assume that data provided is of perfect quality. Therefore, these algorithms tend to perform extremely well in controlled lab environments but fail in the real-world. We propose a method for accurately restoring discrete tem-poral or sequential system traces affected by data loss, using Word 2vec’s Continuous Bag of Words (CBOW) model. The model works by learning to predict the next event in a sequence of events, the model feeds its output back into it for subsequent future predictions. Such a method can reconstruct even long sequence of missing events, and help validate and improve data quality for noisy data. The restored traces are very close to the real-data and can be used by algorithms depending on real-data for system analysis. We demonstrate our method by reconstructing traces from QNX real-time operating system consisting of long sequences of discrete events. We show that given even small parts of a QNX trace, our CBOW model can predict future events with an accuracy of almost 90% outperforming the Markov Model benchmark.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call