Abstract

Abstract. The World Climate Research Programme (WCRP)'s Working Group on Climate Modelling (WGCM) Infrastructure Panel (WIP) was formed in 2014 in response to the explosive growth in size and complexity of Coupled Model Intercomparison Projects (CMIPs) between CMIP3 (2005–2006) and CMIP5 (2011–2012). This article presents the WIP recommendations for the global data infrastructure needed to support CMIP design, future growth, and evolution. Developed in close coordination with those who build and run the existing infrastructure (the Earth System Grid Federation; ESGF), the recommendations are based on several principles beginning with the need to separate requirements, implementation, and operations. Other important principles include the consideration of the diversity of community needs around data – a data ecosystem – the importance of provenance, the need for automation, and the obligation to measure costs and benefits.This paper concentrates on requirements, recognizing the diversity of communities involved (modelers, analysts, software developers, and downstream users). Such requirements include the need for scientific reproducibility and accountability alongside the need to record and track data usage. One key element is to generate a dataset-centric rather than system-centric focus, with an aim to making the infrastructure less prone to systemic failure.With these overarching principles and requirements, the WIP has produced a set of position papers, which are summarized in the latter pages of this document. They provide specifications for managing and delivering model output, including strategies for replication and versioning, licensing, data quality assurance, citation, long-term archiving, and dataset tracking. They also describe a new and more formal approach for specifying what data, and associated metadata, should be saved, which enables future data volumes to be estimated, particularly for well-defined projects such as CMIP6.The paper concludes with a future facing consideration of the global data infrastructure evolution that follows from the blurring of boundaries between climate and weather, and the changing nature of published scientific results in the digital age.

Highlights

  • CMIP6 (Eyring et al, 2016a), the latest Coupled Model Intercomparison Project (CMIP), can trace its genealogy back to the “Charney report” (Charney et al, 1979)

  • We hope this will be of interest to general readers of the journal from other geoscience fields, illuminating the particular character of global data infrastructure for climate data, where the community of users far outstrip in numbers and diversity, the Earth system modelling community itself

  • The DRS elements that rely on these controlled vocabularies appear as netCDF attributes and are used in constructing file names, directory names, and unique identifiers of datasets that are essential throughout the CMIP6 infrastructure

Read more

Summary

Introduction

CMIP6 (Eyring et al, 2016a), the latest Coupled Model Intercomparison Project (CMIP), can trace its genealogy back to the “Charney report” (Charney et al, 1979). We aim to show how the scientific design of CMIP6 as outlined in Eyring et al (2016a) translates into infrastructural requirements We hope this will be instructive to the MIP chairs and creators of multi-model experiments highlighting resource implications. By describing how the design of this infrastructure is severely constrained by resources, we hope to provide a useful perspective to those who find data acquisition and analysis a technical challenge We hope this will be of interest to general readers of the journal from other geoscience fields, illuminating the particular character of global data infrastructure for climate data, where the community of users far outstrip in numbers and diversity, the Earth system modelling community itself.

Historical context
Infrastructural principles
A structured approach to data production
CMIP6 data request
Model inputs
Data reference syntax
CMIP6 data volumes
Complexity
Licensing
Persistent identifiers for acknowledgment and citation
Quality assurance
Documentation of provenance
Replication
Versioning
Errata
The future of the global data infrastructure
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call