Potential of a modern vector supercomputer for practical applications: performance evaluation of SX-ACE

Ryusuke Egawa,Akihiro Musa,Yoko Isobe,Hiroyuki Takizawa,Shintaro Momose,Hiroaki Kobayashi,Kazuhiko Komatsu

doi:10.1007/s11227-017-1993-y

Ryusuke Egawa, Akihiro Musa + Show 5 more

Open Access

https://doi.org/10.1007/s11227-017-1993-y

Copy DOI

Abstract

Achieving a high sustained simulation performance is the most important concern in the HPC community. To this end, many kinds of HPC system architectures have been proposed, and the diversity of the HPC systems grows rapidly. Under this circumstance, a vector-parallel supercomputer SX-ACE has been designed to achieve a high sustained performance of memory-intensive applications by providing a high memory bandwidth commensurate with its high computational capability. This paper examines the potential of the modern vector-parallel supercomputer through the performance evaluation of SX-ACE using practical engineering and scientific applications. To improve the sustained simulation performances of practical applications, SX-ACE adopts an advanced memory subsystem with several new architectural features. This paper discusses how these features, such as MSHR, a large on-chip memory, and novel vector processing mechanisms, are beneficial to achieve a high sustained performance for large-scale engineering and scientific simulations. Evaluation results clearly indicate that the high sustained memory performance per core enables the modern vector supercomputer to achieve outstanding performances that are unreachable by simply increasing the number of fine-grain scalar processor cores. This paper also discusses the performance of the HPCG benchmark to evaluate the potentials of supercomputers with balanced memory and computational performance against heterogeneous and cutting-edge scalar parallel systems.

Highlights

Nowadays, supercomputers have become requisite facilities to accelerate various kinds of simulations in sciences, engineering and economics fields
The actual B/F ratio of the “without Assignable Data Buffer (ADB) and Miss Status Handling Register (MSHR)” case becomes larger than its code B/F ratio. Both MSHR and ADB effectively reduce the actual B/F ratio by storing reusable data in ADB and reducing block memory accesses. These results demonstrate that the sustained performance of real memory-intensive applications can be boosted by collaboration of ADB and MSHR
Since High Performance LINPACK benchmark (HPL), which is a highly scalable MPI program with computation-intensive kernels, is used as a benchmark program in the Top 500 list, HPL results become close to the theoretical peak performance of supercomputers [23]

Summary

Introduction

Supercomputers have become requisite facilities to accelerate various kinds of simulations in sciences, engineering and economics fields. To satisfy ever-increasing demands of computational scientists for a higher computational capability, the peak performance of a supercomputer has drastically been improved. Thanks to the technology scaling and the maturation of many core architectures including accelerators, the theoretical peak performance of the world’s fastest supercomputer achieves 125 petaflop/s (Pflop/s) [1]. Mainly due to the memory wall problem and overhead to handle massive parallelism of the system, there is a big gap between the theoretical and sustained performances of a recent supercomputer for practical applications. As clearly demonstrated in the previous research efforts [2], keeping a high ratio of a memory bandwidth to a high floating-point operation (flop/s) ratio, known as Bytes per Flop ratio; B/F ratio, of a supercomputer is a key factor to achieve a high sustained performance [3,4]

Methods

Results

Conclusion