Abstract

Abstract. High-resolution Earth system model simulations generate enormous data volumes, and retaining the data from these simulations often strains institutional storage resources. Further, these exceedingly large storage requirements negatively impact science objectives, for example, by forcing reductions in data output frequency, simulation length, or ensemble size. To lessen data volumes from the Community Earth System Model (CESM), we advocate the use of lossy data compression techniques. While lossy data compression does not exactly preserve the original data (as lossless compression does), lossy techniques have an advantage in terms of smaller storage requirements. To preserve the integrity of the scientific simulation data, the effects of lossy data compression on the original data should, at a minimum, not be statistically distinguishable from the natural variability of the climate system, and previous preliminary work with data from CESM has shown this goal to be attainable. However, to ultimately convince climate scientists that it is acceptable to use lossy data compression, we provide climate scientists with access to publicly available climate data that have undergone lossy data compression. In particular, we report on the results of a lossy data compression experiment with output from the CESM Large Ensemble (CESM-LE) Community Project, in which we challenge climate scientists to examine features of the data relevant to their interests, and attempt to identify which of the ensemble members have been compressed and reconstructed. We find that while detecting distinguishing features is certainly possible, the compression effects noticeable in these features are often unimportant or disappear in post-processing analyses. In addition, we perform several analyses that directly compare the original data to the reconstructed data to investigate the preservation, or lack thereof, of specific features critical to climate science. Overall, we conclude that applying lossy data compression to climate simulation data is both advantageous in terms of data reduction and generally acceptable in terms of effects on scientific results.

Highlights

  • Earth system models are widely used to study and understand past, present, and future climate states

  • We describe several of the analyses done by scientists and detail the results and the lessons that we learned from their investigations

  • The effectiveness of compression is generally measured by a compression ratio (CR), which is the ratio of the size of the compressed file to that of the original file

Read more

Summary

Introduction

Earth system models are widely used to study and understand past, present, and future climate states. To build confidence in data compression techniques and promote acceptance in the climate community, our aim in this work is to investigate whether applying lossy compression impacts science results or conclusions from a large and publicly available CESM dataset. To this end, we provided climate scientists with access to climate data via the CESM-LE project (Kay et al, 2015).

Data compression
The CESM Large Ensemble project dataset
Approach
Ensemble data evaluations
Climate characteristics
Surface temperature
Top-of-the-atmosphere model radiation
Surface energy balance
Precipitation and evaporation
Differenced temperature field
Ensemble variability patterns
Overview of proper orthogonal decomposition
Application to ensemble data
The original and reconstructed data
Overview of extreme value theory
Causal signatures
AMWG diagnostics package
Lessons learned
Relationships between variables
Individual treatment of variables
Implications for compression algorithms
Concluding remarks
Findings
Code and data availability
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call