Data Compression for the Exascale Computing Era - Survey

Seung Woo Son ,Zhengzhang Chen ,William Hendrix ,Ankit Agrawal ,Wei-Keng Liao ,Alok Choudhary

doi:10.14529/jsfi140205

Seung Woo Son , Zhengzhang Chen + Show 4 more

Open Access

https://doi.org/10.14529/jsfi140205

Copy DOI

Abstract

While periodic checkpointing has been an important mechanism for tolerating faults in high performance computing HPC systems, it is cost-prohibitive as the HPC system approaches exascale. Applying compression techniques is one common way to mitigate such burdens by reducing the data size, but they are often found to be less effective for scientific datasets. Traditional lossless compression techniques that look for repeated patterns are ineffective for scientific data in which high-precision data is used and hence common patterns are rare to find. In this paper, we present a comparison of several lossless and lossy data compression algorithms and discuss their methodology under the exascale environment. As data volume increases, we discover an increasing trend of new domain-driven algorithms that exploit the inherent characteristics exhibited in many scientific dataset, such as relatively small changes in data values from one simulation iteration to the next or among neighboring data. In particular, significant data reduction has been observed in lossy compression. This paper also discusses how the errors introduced by lossy compressions are controlled and the tradeoffs with the compression ratio.

Highlights

The future extreme scale computing systems [13, 35] are facing several challenges in architecture, energy constraints, memory scaling, limited I/O, and scalability of software stacks
Because scientific datasets are mostly floating-point numbers, a naive use of compression algorithms can merely bring a limited improvement in terms of the amount of data reduced while bearing a high compression overhead to perform compression
This paper argues that while the traditional checkpointing continues to be a crucial mechanism to tolerate system failures in many scientific applications, it is becoming challenging in the exascale era mainly because of limited I/O scalability and associated energy cost

Summary

Introduction

The future extreme scale computing systems [13, 35] are facing several challenges in architecture, energy constraints, memory scaling, limited I/O, and scalability of software stacks. Because scientific datasets are mostly floating-point numbers (in single or double precision), a naive use of compression algorithms can merely bring a limited improvement in terms of the amount of data reduced while bearing a high compression overhead to perform compression. Lossy compression on checkpointing implies several challenges for large-scale simulations, e.g., guaranteeing point-wise error bounds defined by the user, reducing a sufficiently large amount of storage space, performing compression in-situ (to reduce data movement), and taking advantages of data reduction by potentially being able to use locally-available non-volatile storage devices. Overall, applying data compression at runtime for checkpointing will become appealing to large-scale scientific simulations on future high-performance computing systems, as it can reduce storage space as well as save the energy resulted from data movement.

Lossless Compression

Integration within Checkpointing Framework

Increasing Compressibility through Transformations

Comparison

Lossy Compression

Transformation Schemes

Approximation Algorithms

Error Bounding Methods

Tradeoffs Approximation Precision and Error Rate

Findings

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Supercomputing Frontiers and Innovations	Publication Date: Jun 1, 2014
Citations: 72	License type: cc-by

R Discovery Prime

R Discovery Prime

Data Compression for the Exascale Computing Era - Survey

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Supercomputing Frontiers and Innovations

Lead the way for us

Similar Papers

Onboard Near-Lossless Data Compression Techniques

-

13 Nov 2013
13 Nov 2013

Parallel Implementation of Lossy Data Compression for Temporal Data Sets
Zheng Yuan ... Alok Choudhary
-
Zheng Yuan, et. al.Zheng Yuan ... Alok Choudhary
01 Dec 2016
01 Dec 2016

Modeling Power Consumption of Lossy Compressed I/O for Exascale HPC Systems
Grant Wilkins ... Jon C Calhoun
-
Grant Wilkins, et. al.Grant Wilkins ... Jon C Calhoun
01 May 2022
01 May 2022

Performance evaluation of lossy quality compression algorithms for RNA-seq data
Rongshan Yu ... Shun Wang
BMC Bioinformatics | VOL. 21
Rongshan Yu, et. al.Rongshan Yu ... Shun Wang
20 Jul 2020
BMC Bioinformatics | VOL. 21

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Data Compression for the Exascale Computing Era - Survey

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Supercomputing Frontiers and Innovations