Abstract

Multi-source Internet of Things (IoT) data, archived in institutions’ repositories, are becoming more and more widely open-sourced to make them publicly accessed by scientists, developers, and decision makers via web services to promote researches on geohazards prevention. In this paper, we design and implement a big data-turbocharged system for effective IoT data management following the data lake architecture. We first propose a multi-threading parallel data ingestion method to ingest IoT data from institutions’ data repositories in parallel. Next, we design storage strategies for both ingested IoT data and processed IoT data to store them in a scalable, reliable storage environment. We also build a distributed cache layer to enable fast access to IoT data. Then, we provide users with a unified, SQL-based interactive environment to enable IoT data exploration by leveraging the processing ability of Apache Spark. In addition, we design a standard-based metadata model to describe ingested IoT data and thus support IoT dataset discovery. Finally, we implement a prototype system and conduct experiments on real IoT data repositories to evaluate the efficiency of the proposed system.

Highlights

  • To address the above-mentioned challenges, we present an Internet of Things (IoT) data management system adopting the data lake architecture, to offer researchers the ability to ingest, manage, query, and explore multi-source IoT big data archived in distributed repositories and accessed via web services

  • We first start with the evaluation of the storage space consumption of IoT data ingested from the above-mentioned data sources

  • There are a number of IoT data files stored in JSON format for each data source, and the number of data files is determined by the number of HTTP GET requests in a data ingestion job for the data source

Read more

Summary

Introduction

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. With the advancement of data acquisition technologies, geohazards data are generated at a staggering rate to help developers and decision makers understand and simulate geohazards, and perceive their harmful implications to human beings [2,3,4]. Internet of Things-based monitoring has become a convenient, vital method for geohazards prevention [5,6]. Organizations and governments have deployed a plethora of devices to realize long-term, fast monitoring of features (e.g., temperature, rainfall) in geohazards bodies. A veritable deluge of IoT data, being obtained, managed by the above institutions, are stored in independent data silos that are distributed across many locations

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call