Abstract

The different rates of increase for computational power and storage capabilities of supercomputers turn data storage into a technical and economical problem. Because storage capabilities are lagging behind, investments and operational costs for storage systems have increased to keep up with the supercomputers' I/O requirements. One promising approach is to reduce the amount of data that is stored. In this paper, we take a look at the impact of compression on performance and costs of high performance systems. To this end, we analyze the applicability of compression on all layers of the I/O stack, that is, main memory, network and storage. Based on the Mistral system of the German Climate Computing Center Deutsches Klimarechenzentrum, DKRZ, we illustrate potential performance improvements and cost savings. Making use of compression on a large scale can decrease investments and operational costs by 50% without negatively impacting performance. Additionally, we present ongoing work for supporting enhanced adaptive compression in the parallel distributed file system Lustre and application-specific compression.

Highlights

  • Throughout the history of supercomputers as recorded by the TOP500 list, the computational power has been increasing exponentially, doubling roughly every 14.5 months [36]. While this increase in computational power has allowed more detailed numerical simulations to be performed, this has caused the simulation results to grow in size exponentially

  • The storage is distributed across roughly 60 Scalable Storage Units (SSUs), which contain two complete storage servers, and 60 Expansion Storage Units (ESUs) that are just JBODs, each connected to one SSU

  • S1: We determine the number of SSU/ESU pairs necessary to achieve a capacity of 50 PB and only purchase this amount

Read more

Summary

Introduction

Throughout the history of supercomputers as recorded by the TOP500 list, the computational power has been increasing exponentially, doubling roughly every 14.5 months [36]. Data reduction can be used to reduce the costs and size of storage systems, and to increase performance. Previous studies have shown that certain data reduction techniques can be beneficial for large-scale storage systems [25] Due to their inherent costs and complexities, techniques such as deduplication and re-computation are not suitable without restrictions. We will investigate the possibilities of applying compression to various levels of the HPC hardware/software stack In this regard, we will analyze typical storage behavior from a datacenter perspective and not focus on particular use cases or data formats. 2. Models to estimate the impact of compression on performance and cost of supercomputers and their storage systems.

Compression algorithms
Memory capacity
Network throughput
Storage capacity and throughput
File systems
Cost and energy efficiency
Modeling the impact of compression
Performance considerations
Parallel distributed file systems
Cost considerations
Main memory
66.56 GB and for lzo
Network
10 GbE Omnipath
Storage
Summary
Ongoing work
File system compression
Application-specific compression
Findings
Conclusion and future work
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call