Abstract

In recent years, the application and wide adoption of Internet of Things (IoT)-based technologies have increased the proliferation of monitoring systems, which has consequently exponentially increased the amounts of heterogeneous data generated. Processing and analysing the massive amount of data produced is cumbersome and gradually moving from classical ‘batch’ processing—extract, transform, load (ETL) technique to real-time processing. For instance, in environmental monitoring and management domain, time-series data and historical dataset are crucial for prediction models. However, the environmental monitoring domain still utilises legacy systems, which complicates the real-time analysis of the essential data, integration with big data platforms and reliance on batch processing. Herein, as a solution, a distributed stream processing middleware framework for real-time analysis of heterogeneous environmental monitoring and management data is presented and tested on a cluster using open source technologies in a big data environment. The system ingests datasets from legacy systems and sensor data from heterogeneous automated weather systems irrespective of the data types to Apache Kafka topics using Kafka Connect APIs for processing by the Kafka streaming processing engine. The stream processing engine executes the predictive numerical models and algorithms represented in event processing (EP) languages for real-time analysis of the data streams. To prove the feasibility of the proposed framework, we implemented the system using a case study scenario of drought prediction and forecasting based on the Effective Drought Index (EDI) model. Firstly, we transform the predictive model into a form that could be executed by the streaming engine for real-time computing. Secondly, the model is applied to the ingested data streams and datasets to predict drought through persistent querying of the infinite streams to detect anomalies. As a conclusion of this study, a performance evaluation of the distributed stream processing middleware infrastructure is calculated to determine the real-time effectiveness of the framework.

Highlights

  • The emergence of the Internet of Things (IoT) has enabled the adoption and development of several real-time monitoring systems for diverse spheres of life such as energy management, health, smart environment, manufacturing, and security

  • To tackle some limitations and harness the benefits of the discussed works, we propose in this paper, a distributed solution that employs stream processing techniques, resource optimization and multi-tenancy approach for real-time processing of heterogeneous data sources in the environmental monitoring and management domain

  • We presented the application of a distributed stream processing framework for the real-time big data analysis of heterogeneous environmental management and monitoring data using

Read more

Summary

Introduction

The emergence of the Internet of Things (IoT) has enabled the adoption and development of several real-time monitoring systems for diverse spheres of life such as energy management, health, smart environment, manufacturing, and security. In the environmental management and monitoring domain, ubiquitous sensors, actuators, instruments provide real-time data acquisition, data-logging with telemetry capabilities [2] These devices keep generating an avalanche of unbounded data streams related to the current status of the deployed environment. Sensors 2020, 20, 3166 represents big data, which has the potential to provide more meaningful insight towards the timely understanding of complex environmental phenomena if properly analysed in real-time [3,4,5,6,7] Despite these potential benefits, building a real-time data analytics system is still challenging due to the variety of data, higher speed of data generation, volume of data to be processed, and the lack of a reliable, scalable and interactive platform [8,9,10]. Stream processing of data streams ensures enhanced analytic functionality, which would provide the necessary meaningful insight from IoT data and increases the productivity of processes for real-time data utilisation [3,6,12]

Objectives
Methods
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call