Abstract

As the level of parallelism in manycore processors keeps increasing, providing efficient mechanisms for thread synchronization in concurrent programs is becoming a major concern. On cache-coherent shared-memory processors, synchronization efficiency is ultimately limited by the performance of the underlying cache coherence protocol. This article studies how hardware support for message passing can improve synchronization performance. Considering the ubiquitous problem of mutual exclusion, we devise novel algorithms for (i) classic locking, where application threads obtain exclusive access to a shared resource prior to executing their critical sections (CSes), and (ii) delegation, where CSes are executed by special threads. For classic locking, our HybLock algorithm uses a mix of shared memory and hardware message passing, which introduces the idea of hybrid synchronization algorithms. For delegation, we propose mp-server and HybComb : the former is a straightforward adaptation of the server approach to hardware message passing, whereas the latter is a novel hybrid combining algorithm. Evaluation on Tilera's TILE-Gx processor shows that HybLock outperforms the best known classic locks. Furthermore, mp-server can execute contended CSes with unprecedented throughput, as stalls related to cache coherence are removed from the critical path. HybComb can achieve comparable performance while avoiding the need to dedicate server cores. Consequently, our queue and stack implementations, based on the new synchronization algorithms, largely outperform their most efficient shared-memory-only counterparts.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call