Optimization of Computation-Intensive Applications in cc-NUMA Architecture

Ming Zhang,Kaixin Ren,Naijie Gu

doi:10.1109/nana.2016.12

Abstract

Remote memory access brings lower bandwidth and higher latency compared with local memory access in Cache Coherent Non-Uniform Memory Access (cc-NUMA) architecture. Especially in the cc-NUMA platform where computing nodes are connected with network, the latency and bandwidth of network perform much worse than Hyper Transport (HT) and PCI-Express (PCI-E) bus. In order to enhance the performance of applications, a Hybrid Parallel Framework for Computation-intensive Applications (HPFCA) was proposed. Task distribution, data storage, multicore parallelism and kernel optimization were discussed in the HPFCA. "MPI+OpenMP/Pthreads" mechanism was used for multi-node platforms. MPI was used for distributed memory parallelism, and "OpenMP/Pthreads" was used for shared memory parallelism. Moreover, GEMM and FFT, the representatives of the computation-intensive applications in the Godson-3B, were studied. According to the HPFCA, the parallel algorithms of GEMM and FFT were optimized. Finally, experimental results demonstrated that HPFCA could bring ideal performance in the Godson-3B.

Full Text