Dissecting the Phytium 2000+ Memory Hierarchy via Microbenchmarking

Wanrong Gao,Chuanfu Xu,Chun Huang,Jianbin Fang

doi:10.1007/978-981-15-8135-9_11

Abstract

An efficient use of the memory system on multi-cores is critical to improving data locality and achieving better program performance. But the hierarchical memory system with caches often works in a “black-box” manner, which automatically moves data across memory layers, and makes code optimization a daunting task. In this article, we dissect the memory system of the Phytium 2000+ many-core with micro-benchmarks. We measure the latency and bandwidth of moving cachelines across memory levels on a single core or two distinct cores. We design a set of micro-benchmarks by using the pointer-chasing method to measure latency, and using the chunk-accessing method to measure bandwidth. During measurement, we have to place the cacheline on the specified memory layer and set its initial consistency state. The experimental results on Phytium 2000+ provide a quantified form of its actual memory performance, and reveal undocumented performance data and micro-architectural details. To conclude, our work will provide quantitative guidelines for optimizing the Phytium 2000+ memory accesses.

Full Text