Abstract

In recent years, quantifying the impacts of detrimental air quality has become a global priority for researchers and policy makers. At present, the systems and methodologies supporting the collection and manipulation of this data are difficult to access. To support studies quantifying the interplay between common gaseous and particulate pollutants with meteorology and biological particles, this paper presents a comprehensive data-set containing daily air quality readings from the Automatic Urban and Rural Network, and pollen and weather data from Met Office monitoring stations, in the years 2016 to 2019 inclusive, for the United Kingdom. We describe (1) the sources from which the data were collected, (2) the methods used for the data cleaning process and (3) how issues related to missing values and sparse regional coverage were addressed. The resulting data-set is designed to be used ‘as is’ by those using air quality data for research; we also describe and provide open access to the methods used for curating the data to allow modification of or addition to the data-set.

Highlights

  • Background & SummaryUnderstanding the ecological effects of air pollution requires the collection of a variety of air quality and meteorological measurements over considerable time periods and across wide geographical regions

  • Some of the challenges faced by data engineers during a data integration process include redundancy, inconsistency and missing values[1]

  • Improving the data integration process, to allow research effort to be focused on knowledge discovery rather than the repeated creation of collection and cleaning methods, or even the repeated processing of the same data, is an ongoing Big Data challenge across many research domains[3,4,5], and is a particular issue for air quality measurements[6] and their interpolation[7]

Read more

Summary

Background & Summary

Understanding the ecological effects of air pollution requires the collection of a variety of air quality and meteorological measurements over considerable time periods and across wide geographical regions This necessitates the complex preprocessing and integration of heterogeneous data from different sources into a single, accessible archive. This paper describes the process of curating a data-set that provides rapid access to air quality measurements drawn from the open Automatic Urban and Rural Network (AURN), integrated with other relevant variables that are pertinent to its interpretation This data-set is already an important part of a number of ongoing projects, including those looking at the optimisation of sensor positioning, the honing of the level of detail required in pollutant speciation, and the relationship between hay fever symptoms and environmental variables (building on the work of Vigo et al.[8]). We describe the data sources, how the data has been cleaned and processed, and how estimations have been made across all UK regions to mitigate sparse or missing data

Methods
Findings
Code availability
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.