Vlock: Lock virtualization mechanism for exploiting fine-grained parallelism in graph traversal algorithms

Jie Yan Jie Yan,Guangming Tan Guangming Tan,Erlin Yao Erlin Yao,Xiuxia Zhang Xiuxia Zhang,Ninghui Sun Ninghui Sun

doi:10.1109/cgo.2013.6494991

Abstract

For graph traversal applications, fine synchronization is required to exploit massive fine parallelism. However, in the conventional solution using fine-grained locks, locks themselves suffer huge memory cost as well as poor locality for inherent irregular access to vertices. In this paper, we propose a novel fine lock solution-vLock. The key idea is lock virtualization that maps the huge logical lock space to a much smaller physical lock space that can reside in cache during the program life cycle. Lock virtualization effectively reduces lock incurred overheads of both memory cost and cache misses. It also achieves high usability in legacy graph programs, as from users's view vLock is the same as lock methods in Pthreads. We implement vLock as a Pthreads-like library and evaluate its performance in four classical graph algorithms (BFS, SSSP, CC, PageRank). Experiments on a SMP system with two Intel Westemere six-core processors show that, compared to conventional fine locks, vLock significantly reduces locks' cache misses and has competitive performance. Particularly, PageRank with vLock has about 20% performance improvement.

Full Text