Abstract

Abstract. Lossy compression of scientific data arrays is a powerful tool to save network bandwidth and storage space. Properly applied lossy compression can reduce the size of a dataset by orders of magnitude while keeping all essential information, whereas a wrong choice of lossy compression parameters leads to the loss of valuable data. An important class of lossy compression methods is so-called precision-preserving compression, which guarantees that a certain precision of each number will be kept. The paper considers statistical properties of several precision-preserving compression methods implemented in NetCDF Operators (NCO), a popular tool for handling and transformation of numerical data in NetCDF format. We compare artifacts resulting from the use of precision-preserving compression of floating-point data arrays. In particular, we show that a popular Bit Grooming algorithm (default in NCO until recently) has suboptimal accuracy and produces substantial artifacts in multipoint statistics. We suggest a simple implementation of two algorithms that are free from these artifacts and have double the precision. One of them can be used to rectify the data already processed with Bit Grooming. We compare precision trimming for relative and absolute precision to a popular linear packing (LP) method and find out that LP has no advantage over precision trimming at a given maximum absolute error. We give examples when LP leads to an unconstrained error in the integral characteristic of a field or leads to unphysical values. We analyze compression efficiency as a function of target precision for two synthetic datasets and discuss precision needed in several atmospheric fields. Mantissa rounding has been contributed to NCO mainstream as a replacement for Bit Grooming. The Appendix contains code samples implementing precision trimming in Python3 and Fortran 95.

Highlights

  • Resolutions and the level of details of processes simulated with geoscientific models increase together with the increase in computing power available

  • A simple method for trimming precision by rounding a mantissa of floating-point numbers has been implemented and tested. It has been incorporated into the NetCDF Operators (NCO) mainstream and has been used by default since v 4.9.4

  • The method has half the quantization error of the Bit Grooming method (Zender, 2016), which was used by default in earlier versions of NCO

Read more

Summary

Introduction

Resolutions and the level of details of processes simulated with geoscientific models increase together with the increase in computing power available. Transformations of a data array reducing its information entropy while introducing acceptable distortions pose the basis for lossy compression algorithms. An often used method of lossy compression is linear packing, when the original floating-point data are mapped to a shorter-length integer data by a linear transformation. Setting a certain number of least significant bits of the floating-point numbers in a data array to a prescribed value (trimming the precision) substantially reduces the entropy of the data making lossless compression algorithms much more efficient. Zender (2016) implemented precision trimming in a versatile data-processing tool set called NetCDF Operators (NCO; http://nco.sourceforge.net, last access: 7 December 2020), enabling the internal data compression features of the NetCDF4 format (https://www.unidata.ucar.edu/ software/netcdf, last access: 7 December 2020) to work efficiently. Appendix A contains example implementations of subroutines for precision trimming

Precision-trimming methods
Quantification of errors
Examples
Keeping absolute precision
Precision of linear packing
Compressing precision-trimmed data
Practical examples
Conclusions
Halfshave
Findings
Rounding to given absolute precision in Fortran 95
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call