Context aware benchmarking and tuning of a TByte-scale air quality database and web service

Clara Betancourt,Björn Hagemeier,Sabine Schröder,Martin G Schultz

doi:10.1007/s12145-021-00631-4

Clara Betancourt, Björn Hagemeier + Show 2 more

Open Access

https://doi.org/10.1007/s12145-021-00631-4

Copy DOI

Journal: Earth science informatics	Publication Date: Jun 7, 2021
Citations: 4	License type: open-access

Affiliation: Forschungszentrum Jülich

Abstract

We present context-aware benchmarking and performance engineering of a mature TByte-scale air quality database system which was created by the Tropospheric Ozone Assessment Report (TOAR) and contains one of the world’s largest collections of near-surface air quality measurements. A special feature of our data service https://join.fz-juelich.de is on-demand processing of several air quality metrics directly from the TOAR database. As a service that is used by more than 350 users of the international air quality research community, our web service must be easily accessible and functionally flexible, while delivering good performance. The current on-demand calculations of air quality metrics outside the database together with the necessary transfer of large volume raw data are identified as the major performance bottleneck. In this study, we therefore explore and benchmark in-database approaches for the statistical processing, which results in performance enhancements of up to 32%.

Highlights

Due to enhanced sensor technologies and widened monitoring efforts around the world, scientific databases of environmental observations have grown to terabyte scale
We benchmarked the following measures to increase the performance of the TByte-scale Tropospheric Ozone Assessment Report (TOAR) air quality observations database and connected JOIN web service: server-side programming in PL/pgSQL and PL/Python, parallel scans/processing, optimal definition of indices, and on-line aggregation to avoid transfer of large data
Through the above mentioned techniques, the performance of JOIN can be improved in a range of approx. 6 – 32%

Summary

Introduction

Due to enhanced sensor technologies and widened monitoring efforts around the world, scientific databases of environmental observations have grown to terabyte scale. This can pose challenges on their performance, especially when the database is continuously extended with new data (Directorate-General for Communication EC 2018; Gray and Szalay 2002). The TOAR data infrastructure was created by the research centre Julich in the context of the global TOAR initiative Schultz et al (2017) It meets special requirements of the TOAR user community in terms of data acquisition, openness, functionality, flexibility, performance and FAIRness (Wilkinson et al 2016). This motivated us to study various potential performance improvements and report them here

Objectives

Results

Conclusion