Since recent scientific and engineering simulations require heavy computations with large volumes of data, High-performance Computing (HPC) systems need a high computational capability with a large memory capacity. Most recent HPC systems adopt a parallel processing architecture, where the computational capability of the processors is increasing, however, the performance of the memory system is constrained. The bytes per flop (B/F), which is a ratio of the memory bandwidth to the flop/s, for the HPC systems have been reduced with the evolution of the HPC systems. To fully exploit the potential of the recent HPC systems, and to meet the increasing demand for large memory, it is necessary to optimize practical scientific and engineering applications, considering not only the parallelism of the applications, but also the limitations of the memory subsystems of the HPC systems. In this paper, we discuss a set of approaches to optimization of the memory access behavior of the applications, which enable their executions with improved performance on the recent HPC systems. Our approaches include memory optimizations through memory footprint controlling, restructuring of data structures for active elements, redundant data structure elimination through combined calculations and optimized re-calculation of data. To validate the effectiveness of our approaches, a plasmonics simulation application is evaluated on vector platforms NEC SX-ACE, NEC SX-9, and Intel Xeon based platform NEC LX 406-Re2. By applying our approaches to the implementation, the memory usage of the plasmonics simulation application can be reduced up to nearly 1/71 of the original, and its execution can be possible on a single node of a distributed parallel system with smaller memory capacity. The optimization results in 1.14 times faster execution on SX-ACE and 1.81 times faster execution on LX 406-Re2.
Read full abstract