A note on precision-preserving compression of scientific data

Rostislav Kouznetsov

doi:10.5194/gmd-14-377-2021

Abstract

Abstract. Lossy compression of scientific data arrays is a powerful tool to save network bandwidth and storage space. Properly applied lossy compression can reduce the size of a dataset by orders of magnitude while keeping all essential information, whereas a wrong choice of lossy compression parameters leads to the loss of valuable data. An important class of lossy compression methods is so-called precision-preserving compression, which guarantees that a certain precision of each number will be kept. The paper considers statistical properties of several precision-preserving compression methods implemented in NetCDF Operators (NCO), a popular tool for handling and transformation of numerical data in NetCDF format. We compare artifacts resulting from the use of precision-preserving compression of floating-point data arrays. In particular, we show that a popular Bit Grooming algorithm (default in NCO until recently) has suboptimal accuracy and produces substantial artifacts in multipoint statistics. We suggest a simple implementation of two algorithms that are free from these artifacts and have double the precision. One of them can be used to rectify the data already processed with Bit Grooming. We compare precision trimming for relative and absolute precision to a popular linear packing (LP) method and find out that LP has no advantage over precision trimming at a given maximum absolute error. We give examples when LP leads to an unconstrained error in the integral characteristic of a field or leads to unphysical values. We analyze compression efficiency as a function of target precision for two synthetic datasets and discuss precision needed in several atmospheric fields. Mantissa rounding has been contributed to NCO mainstream as a replacement for Bit Grooming. The Appendix contains code samples implementing precision trimming in Python3 and Fortran 95.

Highlights

Resolutions and the level of details of processes simulated with geoscientific models increase together with the increase in computing power available
A simple method for trimming precision by rounding a mantissa of floating-point numbers has been implemented and tested. It has been incorporated into the NetCDF Operators (NCO) mainstream and has been used by default since v 4.9.4
The method has half the quantization error of the Bit Grooming method (Zender, 2016), which was used by default in earlier versions of NCO

Summary

Introduction

Resolutions and the level of details of processes simulated with geoscientific models increase together with the increase in computing power available. Transformations of a data array reducing its information entropy while introducing acceptable distortions pose the basis for lossy compression algorithms. An often used method of lossy compression is linear packing, when the original floating-point data are mapped to a shorter-length integer data by a linear transformation. Setting a certain number of least significant bits of the floating-point numbers in a data array to a prescribed value (trimming the precision) substantially reduces the entropy of the data making lossless compression algorithms much more efficient. Zender (2016) implemented precision trimming in a versatile data-processing tool set called NetCDF Operators (NCO; http://nco.sourceforge.net, last access: 7 December 2020), enabling the internal data compression features of the NetCDF4 format (https://www.unidata.ucar.edu/ software/netcdf, last access: 7 December 2020) to work efficiently. Appendix A contains example implementations of subroutines for precision trimming

Precision-trimming methods

Quantification of errors

Examples

Keeping absolute precision

Precision of linear packing

Compressing precision-trimmed data

Practical examples

Conclusions

Halfshave

Findings

Rounding to given absolute precision in Fortran 95

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Geoscientific Model Development	Publication Date: Jan 22, 2021
Citations: 4	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

A note on precision-preserving compression of scientific data

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Geoscientific Model Development

Lead the way for us

Similar Papers

Bit Grooming: statistically accurate precision-preserving quantization with compression, evaluated in the netCDF Operators (NCO, v4.4.8+)
Charles S Zender
Geoscientific Model Development | VOL. 9
Charles S ZenderCharles S Zender
19 Sep 2016
Geoscientific Model Development | VOL. 9

Preliminary analysis of noisy image lossy compression by discrete atomic transform-based coder
Viktor Makarichev ... Vladimir Lukin
Aerospace technic and technology | VOL. -
Viktor Makarichev, et. al.Viktor Makarichev ... Vladimir Lukin
24 Apr 2023
Aerospace technic and technology | VOL. -

Data Reduction Using Lossy Compression for Cosmology and Astrophysics Workflows
Jesus Pulido ... Zarija Lukic
Journal of Physics: Conference Series | VOL. 1290
Jesus Pulido, et. al.Jesus Pulido ... Zarija Lukic
01 Oct 2019
Journal of Physics: Conference Series | VOL. 1290

Residual Coding of Images for L-Infinity Control under the JPEG 2000 Framework
A Lucero ... A Aguirre
-
A Lucero, et. al.A Lucero ... A Aguirre
26 Mar 2006
Residual Coding of Images for L-Infinity Control under the JPEG 2000 Framework
A Lucero ... A Aguirre

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A note on precision-preserving compression of scientific data

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Geoscientific Model Development