Abstract

This paper introduces a generic and scalable anomaly detection framework. Anomaly detection can improve operation and maintenance efficiency and assure experiments can be carried out effectively. The framework facilitates common tasks such as data sample building, retagging and visualization, deviation measurement and performance measurement for machine learning-based anomaly detection methods. The samples we used are sourced from Ganglia monitoring data. There are several anomaly detection methods to handle spatial and temporal anomalies within the framework. Finally, we show the rudimental application of the framework on Lustre distributed file systems in daily operation and maintenance.

Highlights

  • At present, the Institute of High Energy Physics (IHEP) local cluster consists of 20,000 CPU slots, hundreds of data servers, 20 PB disk storage and 10 PB tape storage

  • After data taking from the Jiangmen Underground Neutrino Observatory (JUNO) and the Large High Altitude Air Shower Observatory (LHAASO) [1] experiment, the data volume processed at this center will approach 10 PB per year

  • We develop a generic anomaly detection framework based on machine learning

Read more

Summary

Introduction

We develop a generic anomaly detection framework based on machine learning. Spatial anomalies are points of high-dimensional data without time dimension. For temporal anomalies, they may not be spatial anomalies, but they are quite different from the current sequence data by analyzing temporal characteristics. The prediction methods of the fourth category are based on deep learning [8] such as Hierarchical Temporal Memory (HTM) [9, 10], etc. The framework we developed in Python is suitable to be expanded with statistical machine learning algorithms and deep learning algorithms. It provides some functions such as data sample building, retagging and visualization, deviation measurement and performance measurement for machine learning-based anomaly detection methods

Architecture
Time-series features extraction
Prediction algorithms
Detection algorithms
Isolation Forest
Prediction Experiments
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.