Abstract

Outlier detection on data streams identifies unusual states to sense and alarm potential risks and faults of the target systems in both the cyber and physical world. As different parameter settings of machine learning algorithms can result in dramatically different performance, automatic parameter selection is also of great importance in deploying outlier detection algorithms in data streams. However, current canonical parameter selection methods suffer from two key challenges: (i) Data streams generally evolve over time, but these existing methods use a fixed training set, which fails to handle this evolving environment and often results in suboptimal parameter recommendations; (ii) The stream is infinite, and thus any parameter selection method taking the entire stream as input is infeasible. In light of these limitations, this paper introduces a Dynamic Parameter Selection method for outlier detection on data Streams (DPSS for short). DPSS uses Gaussian process regression to model the relationship between parameters and detecting performance and uses Bayesian optimization to explore the optimal parameter setting. For each new subsequence, DPSS updates the recommended parameter setting to suit the evolving characteristics. Besides, DPSS only uses historical calculations to guide the parameter setting sampling and adjust the Gaussian process regression results. DPSS can be employed as an auxiliary plug-in tool to improve the detection performance of outlier detection methods. Extensive experiments show that our method can significantly improve the F-score of outlier detectors in data streams compared to its counterparts and obtains more superior parameter selection performance than other state-of the-art parameter selection approaches. DPSS also achieves better time and memory efficiency compared to competitors.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call