CEBDA 2018 Keynote

Franck Cappello

doi:10.1109/ipdpsw.2018.00186

Abstract

Extreme-scale scientific simulations are already generating more data that can be communicated stored and analyzed. The data flood will get even worse with future exascale systems. This is true for output data and also for checkpoint/restart data. Scientific data reduction is a necessity to drastically accelerate I/O, reduce data footprint on storage and also to speed-up significantly computation, as demonstrated by the 2017 Gordon Bell award winner. But reduction should be performed wisely, for execution correctness (checkpoint/restart) and to keep the information that matters for the scientists. We can try to develop application-specific lossy data reduction techniques or to compress the dataset with advanced generic lossless compression algorithms. Unfortunately, these two approaches are either unpractical for most applications or do not provide enough data reduction for scientific datasets. Other domains already familiar with Big Data massively employ lossy compression to reduce the data size. However, lossy compression has very rarely been applied to scientific simulation data and as a result it is not well understood. In this talk, we will present challenges and opportunities in terms of compression algorithms and application of lossy compression to scientific data. We will detail not only the best-in-class compression algorithms but also the tools to assess comprehensively the error introduced by lossy compression. We will give examples of lossy compression of scientific datasets with application to visualization and checkpoint/restart. Lossy compression of scientific data reveal itself as a fascinating young research domain with many opportunities to explore and discover new techniques.

Full Text