Abstract

Abstract. Clustering – the automated grouping of similar data – can provide powerful and unique insight into large and complex data sets, in a fast and computationally efficient manner. While clustering has been used in a variety of fields (from medical image processing to economics), its application within atmospheric science has been fairly limited to date, and the potential benefits of the application of advanced clustering techniques to climate data (both model output and observations) has yet to be fully realised. In this paper, we explore the specific application of clustering to a multi-model climate ensemble. We hypothesise that clustering techniques can provide (a) a flexible, data-driven method of testing model–observation agreement and (b) a mechanism with which to identify model development priorities. We focus our analysis on chemistry–climate model (CCM) output of tropospheric ozone – an important greenhouse gas – from the recent Atmospheric Chemistry and Climate Model Intercomparison Project (ACCMIP). Tropospheric column ozone from the ACCMIP ensemble was clustered using the Data Density based Clustering (DDC) algorithm. We find that a multi-model mean (MMM) calculated using members of the most-populous cluster identified at each location offers a reduction of up to ∼ 20 % in the global absolute mean bias between the MMM and an observed satellite-based tropospheric ozone climatology, with respect to a simple, all-model MMM. On a spatial basis, the bias is reduced at ∼ 62 % of all locations, with the largest bias reductions occurring in the Northern Hemisphere – where ozone concentrations are relatively large. However, the bias is unchanged at 9 % of all locations and increases at 29 %, particularly in the Southern Hemisphere. The latter demonstrates that although cluster-based subsampling acts to remove outlier model data, such data may in fact be closer to observed values in some locations. We further demonstrate that clustering can provide a viable and useful framework in which to assess and visualise model spread, offering insight into geographical areas of agreement among models and a measure of diversity across an ensemble. Finally, we discuss caveats of the clustering techniques and note that while we have focused on tropospheric ozone, the principles underlying the cluster-based MMMs are applicable to other prognostic variables from climate models.

Highlights

  • Clustering is a flexible and unsupervised numerical technique that involves the segregation of data into statistically similar groups

  • Using the principles described above, the Density based Clustering (DDC) algorithm was applied to the ACCMIP model ensemble of tropospheric column ozone on a monthly basis, and a multi-model mean (MMM) value was calculated as an average of model values in the primary cluster at each location

  • The clustering technique was applied to simulated fields of tropospheric column ozone from the 14 chemistry–climate model (CCM) that took part in the ACCMIP model inter-comparison

Read more

Summary

Introduction

Clustering is a flexible and unsupervised numerical technique that involves the segregation of data into statistically similar groups (or “clusters”). Observation agreement and (b) a mechanism with which to identify model development priorities In terms of the former, clustering provides a data-driven method of grouping the model output at each place and time by how well each modelled value agrees with the ensemble as a whole. This potentially enables refinement of the ensemble by objectively identifying outlier data at a given place and time on a caseby-case basis, potentially removing the need to perform blanket model exclusions In terms of the latter, clustering provides potential insight into model development needs through exploring the membership of the clusters, for example why a specific model may always be excluded from the most populous cluster at a particular location.

A brief overview of cluster-based classification
The principles of cluster-based ensemble subsampling
Overview of ACCMIP datasets
Ozone radius selection
Spatial radii selection
Scenarios and metrics
Assessment of cluster-based MMM on a global basis
Assessment of cluster-based MMM: spatial variability
Insights from cluster population into model spread
Insights from cluster membership into model agreement and spread
Future work
Concluding remarks
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.