Abstract

This thesis investigates the state of the environmental data lifecycle in the Internet of Things era. We focus on two IoT stressors: a) the constraint resources ecosystem and b) the syntactic and semantic heterogeneity, and investigate their impact on environmental timeseries storage, dissemination, acquisition and integration. We argue that heterogeneity along with the low-capabilities of IoT devices, render past best-practices on environmental timeseries lifecycle not directly applicable. This thesis addresses the following research questions: Can environmental timeseries lifecycle be facilitated by IoT prototyping devices? Are environmental data dissemination protocols IoT-ready? How can e-scientists acquire, integrate and transform environmental timeseries datasets in the heterogeneous IoT ecosystem? In the light of the IoT resource-constrained ecosystem, we investigate whether a) the IoT prototyping devices can facilitate the environmental timeseries lifecycle, and b) environmental data dissemination protocols are IoT-ready. Chapter 2 presents our research to support resilient data storage on IoT prototyping devices. We focus on Raspberry Pi as an IoT prototyping device and explore its capabilities for resilient data storage, interoperable data dissemination through established standards and performance under concurrent requests from external clients. Chapter 3 presents our efforts to transform an established, environmental data dissemination protocol to be IoT compatible. We focus on OGC Sensor Observation Service (SOS) and argue that it was not designed to operate efficiently in the IoT enabling ecosystem. We designed and implemented a backwards-compatible extension which renders OGC SOS disruption-tolerant and supports for resource economizing. In the light of data heterogeneity which is amplified in the IoT era, we explore new methodologies to support e-scientists towards acquiring, integrating and transforming environmental timeseries datasets. We argue that current approaches to facilitate aforementioned environmental data lifecycle processes have certain limitations. This is why, on top of the legacy environmental datasets, come new IoT-produced datasets which are increasingly used in environmental campaigns. These, are not always properly annotated and/or they report their data in custom formats, which render contemporary data acquisition and integration approaches not directly applicable. Chapter 4 reviews these approaches and proposes a declarative one to support e-scientists towards universal acquisition and integration of syntactically heterogeneous timeseries datasets. Our declarative approach is founded on templates, which are abstract descriptions of a dataset’s syntax using programming language-agnostic semantics. We argue that templates offer a compromise between generality and simplicity, as e-scientists with different computer literacy profiles can develop them. We demonstrate the syntactic interoperability capabilities of our approach with several case studies spanning across different environmental domains (i.e. meteorology, agriculture, urban air quality and hydrology). Chapter 5 extends this declarative approach with a reasoner to support semantic operations. We focus on one semantic heterogeneity challenge, that is the different units of measurement according to which observables are reported. Using user-defined semantic annotations, the reasoner determines the compatibility among datasets that are a) formatted with different syntaxes, b) annotated with custom semantics and c) reported with different units of measurement. We demonstrate the semantic interoperability capabilities of our approach in a case study where we transform meteorological syntactically and semantically heterogeneous input files of four agricultural models, performing (when applicable) the on-the-fly units of measurement transformation. Chapter 6 concludes this thesis and summarizes its main contributions, which are regarded with: providing with insights about the limits of contemporary IoT gateways and their performance as active participants in the environmental data lifecycle (Chapter 2), developing an IoT-ready, backwards compatible extension for the OGC SOS to support interoperable data dissemination on-site (Chapter 3), designing and implementing a declarative approach which facilitates the acquisition, transformation and integration of syntactically (Chapter 4) and semantically (Chapter 5) heterogeneous environmental datasets.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call