Abstract

Internet of Things (IoT) is emerging, and 5G enables much more data transport from mobile and wireless sources. The data to be transmitted is too much compared to link capacity. Labelling data and transmit only useful part of the collected data or their features is a promising solution for this challenge. Abnormal data are valuable due to the need to train models and to detect anomalies when being compared to already overflowing normal data. Labelling can be done in data sources or edges to balance the load and computing between sources, edges, and centres. However, unsupervised labelling method is still a challenge preventing to implement the above solutions. Two main problems in unsupervised labelling are long-term dynamic multiseasonality and heteroscedasticity. This paper proposes a data-driven method to handle modelling and heteroscedasticity problems. The method contains the following main steps. First, raw data are preprocessed and grouped. Second, main models are built for each group. Third, models are adapted back to the original measured data to get raw residuals. Fourth, raw residuals go through deheteroscedasticity and become normalized residuals. Finally, normalized residuals are used to conduct anomaly detection. The experimental results with real-world data show that our method successfully increases receiver-operating characteristic (AUC) by about 30%.

Highlights

  • Together with rapid development of 5G, the connection requirement of wireless devices is developing due to the eased connectivity and much shorter delay

  • DOW-flow level heteroscedasticity (FLH) is preferred for less false positives on the optimal cut-off point compared to DOW without FLH due to the data sensitivity to false positive

  • The experiment results show that the proposed DOW algorithm is good at matching multiseasonality time series patterns, and FLH can solve heteroscedasticity problem

Read more

Summary

Introduction

Together with rapid development of 5G, the connection requirement of wireless devices is developing due to the eased connectivity and much shorter (in milliseconds) delay. While lots of mobile vehicles are connected to the IoT network as data sources [3], much more data is produced. On one aspect, it is an opportunity for machine learning-based data processing methods. IoT data processing happens across the entire system [9] The sensed data are sent via possible networking routing which could be fully used for distributed processing [11], especially together with the application layer [12,13,14]. When the data arrive at the centre, data mining algorithms could be applied [17] to analyse and conduct prediction in most cases

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call