Performance Optimization of the HPCG Benchmark on the Sunway TaihuLight Supercomputer

Yulong Ao,Chao Yang,Qiao Sun,Fangfang Liu,Lijuan Jiang,Wanwang Yin

doi:10.1145/3182177

Performance Optimization of the HPCG Benchmark on the Sunway TaihuLight Supercomputer

Yulong Ao, Chao Yang + Show 4 more

Open Access

https://doi.org/10.1145/3182177

Copy DOI

Journal: ACM Transactions on Architecture and Code Optimization	Publication Date: Mar 22, 2018
Citations: 24

Affiliation: Chinese Academy of Sciences, Peking University

#Sunway TaihuLight Supercomputer #Sunway TaihuLight + Show 8 more

Abstract
Full-Text PDF
Similar Papers

Abstract

In this article, we present some key techniques for optimizing HPCG on Sunway TaihuLight and demonstrate how to achieve high performance in memory-bound applications by exploiting specific characteristics of the hardware architecture. In particular, we utilize a block multicoloring approach for parallelization and propose methods such as requirement-based data mapping and customized gather collective to enhance the effective memory bandwidth. Experiments indicate that the optimized HPCG code can sustain 77% of the theoretical memory bandwidth and scale to the full system of more than 10 million cores, with an aggregated performance of 480.8 Tflop/s and a weak scaling efficiency of 87.3%.

Full Text