Abstract

Currently, big sensor data arise in a wide spectrum of Industry 4.0, Internet of Things, and Smart City applications. In such subject domains, sensors tend to have a high frequency and produce massive time series in a relatively short time interval. The data collected from the sensors are subject to mining in order to make strategic decisions. In the article, we consider the problem of choosing a Time Series Database Management System (TSDBMS) to provide efficient storing and mining of big sensor data. We overview InfluxDB, OpenTSDB, and TimescaleDB, which are among the most popular state-of-the-art TSDBMSs, and represent different categories of such systems, namely native, add-ons over NoSQL systems, and add-ons over relational DBMSs (RDBMSs), respectively. Our overview shows that, at present, TSDBMSs offer a modest built-in toolset to mine big sensor data. This leads to the use of third-party mining systems and unwanted overhead costs due to exporting data outside a TSDBMS, data conversion, and so on. We propose an approach to managing and mining sensor data inside RDBMSs that exploits the Matrix Profile concept. A Matrix Profile is a data structure that annotates a time series through the index of and the distance to the nearest neighbor of each subsequence of the time series and serves as a basis to discover motifs, anomalies, and other time-series data mining primitives. This approach is implemented as a PostgreSQL extension that allows an application programmer both to compute matrix profiles and mining primitives and to represent them as relational tables. Experimental case studies show that our approach surpasses the above-mentioned out-of-TSDBMS competitors in terms of performance since it assumes that sensor data are mined inside a TSDBMS at no significant overhead costs.

Highlights

  • Big sensor data arise in a wide spectrum of Industry 4.0 [1], Internet of Things (IoT) [2], Smart City [3], and Smart Home [4] applications

  • An add-on Time Series Database Management System (TSDBMS) is implemented on top of a third-party system that provides the TSDBMS with a database engine and a data storage system

  • We keep in mind the fact that out-of-TSDBMS time series data mining is potentially faster than the in-TSDBMS one, but the absence of export-import overhead costs in our approach allows us to hope for an eventual advantage

Read more

Summary

Introduction

Big sensor data arise in a wide spectrum of Industry 4.0 [1], Internet of Things (IoT) [2], Smart City [3], and Smart Home [4] applications. The data obtained from sensors should be stored permanently and subjected to mining to extract hidden knowledge and make strategic decisions In performing these tasks, a Time Series Database Management System (TSDBMS) plays a critically important role to provide an application programmer with means and tools to efficiently process and analyze such amounts of sensor data. Despite the widespread use of NoSQL systems, RDBMSs remain the basic work tools to store and manipulate data in a wide spectrum of subject domains. This claim is supported by the statistics of the DB-Engines.com portal (see DBMS popularity broken down by database model, https://db-engines.com/en/ ranking_categories, accessed on 12 April 2021) pointing out that relational DBMSs hold up to 75 percent of the market. InfluxDB supports command line and HTTP interfaces, as well as client libraries and plugins [37]

Organization of Data Storage
Query Language
Embedding Matrix Profile Management into RDBMS
Experimental Case Studies
Electricity Theft Detection
Detection of Active Electricity Consumption
Tracking the Operational Status of an Industrial Machine
Findings
Conclusions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call