Combining reduction with synchronization barrier on multi‐core processors

Aboul‐Karim Mohamed El Maarouf,Luc Giraud,Abdou Guermouche,Thomas Guignon

doi:10.1002/cpe.7402

Abstract

SummaryWith the rise of multi‐core processors with a large number of cores, the need for shared memory reduction that performs efficiently on a large number of cores is more pressing. Efficient shared memory reduction on these multi‐core processors will help share memory programs be more efficient. In this article, we propose a reduction method combined with a barrier method that uses SIMD read/write instructions to combine barrier signaling and reduction value to minimize memory/cache traffic between cores, thereby reducing barrier latency. We compare different barriers and reduction methods on three multi‐core processors and show that the proposed combining barrier/reduction methods are 4 and 3.5 times faster than respectively GCC 11.1 and Intel 21.2 OpenMP 4.5 reduction.

Full Text