Abstract

Selecting the right features for further data analysis is important in the process of equipment anomaly detection, especially when the origin data source involves high dimensional data with a low value density. However, existing researches failed to capture the fact that the sensor data are usually correlated (e.g., duplicated deployed sensors), and the correlations would be broken when anomalies occur with happen to the monitored equipment. In this paper, we propose to capture such sensor data correlation changes to improve the performance of IoT (Internet of Things) equipment anomaly detection. In our feature selection method, we first cluster correlated sensors together to recognize the duplicated deployed sensors according to sensor data correlations, and we monitor the data correlation changes in real time to select the sensors with correlation changes as the representative features for anomaly detection. To that end, (1) we conducted curve alignment for the sensor clustering; (2) we discuss the appropriate window size for data correlation calculation; (3) and adopted MCFS (Multi-Cluster Feature Selection) into our method to adapt to the online feature selection scenario. According to the experiment evaluation derived from real IoT equipment, we prove that our method manages to reduce the false negative of IoT equipment anomaly detection of 30% with almost the same level of false positive.

Highlights

  • In the era of the fourth industrial revolution, there is a growing trend to deploy sensors on industrial equipment, and analyze the industrial equipment’s running status according to the sensor data

  • The performance of the learning model was seriously impacted by the features selected, feature selection plays a critical role for such methods

  • It is noted that our method is able to improve the detection accuracy to almost the same level of the original data set, indicating that our method is able to select a similar set of features that are useful for the anomaly detection

Read more

Summary

Introduction

In the era of the fourth industrial revolution, there is a growing trend to deploy sensors on industrial equipment, and analyze the industrial equipment’s running status according to the sensor data. Due to the considerable development of deep learning in recent years, a common practice of such analysis is to conduct deep learning [2,3,4]. Such methods select a subset of all fetched sensor data stream as the input features, and generate equipment predictions. To select an appropriate set of features for the learning model, researchers aim to select the most relevant features to the prediction model to improve the prediction performance, or to select the most informative features to conduct data reduction Both kinds of methods have intrinsic drawbacks when applied in the online scenarios. The former kind of methods seriously depends on predefined evaluation criteria, such as feature relevance metrics [5] or a predefined learning model [6]

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call