Optimization strategies for inter-thread synchronization overhead on NUMA machine

Song Wu Song Wu,Jun Zhang Jun Zhang,Yaqiong Peng Yaqiong Peng,Wenbin Jiang Wenbin Jiang,Hai Jin

doi:10.1109/pccc.2015.7410330

Abstract

Overhead caused by data consistence issue in inter-thread synchronization probably degrades the performance of parallel applications. Non-Uniform Memory Access (NUMA), as the mainstream architecture in today's multicore processor, further exacerbates this issue due to the significant overhead incurred by Remote Memory Reference (RMR). Therefore, to reduce synchronization overhead, it is important to solve the data consistence issue. In this paper, we classify the overhead into two kinds: (1) overhead incurred by algorithms themselves, and (2) overhead incurred by critical sections. To reduce two kinds of overhead on NUMA machine, we present two optimization strategies called search and backtrace (SAB) and reorder critical section and non-critical section (RCAN), respectively. In SAB, a server thread tries to search a thread coming from master NUMA node, and designates it as the new server thread. In this way, most of the time, shared data resides in the cache of master NUMA node, resulting in lower overhead caused by data consistence issue in critical section. In RCAN, each thread consecutively posts synchronization requests, followed by consecutively executing non-critical section. In this way, server threads could serve enough requests, resulting in better data locality. We design an algorithm named R-Synch based on SAB, while designing an algorithm named H-STA based on RCAN. Our evaluation with representative synchronization algorithms demonstrates the effectiveness of R-Synch and H-STA.

Full Text