An Unequal Caching Strategy for Shared-Memory Graph Analytics

Yuang Chen,Yeh-Ching Chung

doi:10.1109/tpds.2022.3218885

Abstract

Recent advances in computer architecture significantly enhance the computational capacity of multicore systems. It allows large-scale graphs to be processed inside a single machine. Nevertheless, the irregular processing pattern of graph-structured data constrains the hardware resources from being productively utilized. In this paper, we investigate the constraints in two aspects: workload imbalance and parallel inefficiency. When a graph analytics algorithm is multithreaded, the thread time is highly diversified, indicating an uneven work distribution. Also, the intensive thread contention lowers the computing capacity of CPU cores, thereby hindering the effective utilization of CPU resources. To address these challenges, we present a proactive graph caching strategy that unequally segments graph components into cache-able subsets of varying sizes, namely Syze. First, the computational loads of cache-sized subgraphs are estimated. Then, the demanding subgraphs are further subdivided until certain threshold is met. Moreover, during the propagation of updates, a fraction of vertex ID (i.e., several bits) are encoded to facilitate the communication between subgraphs. As a result, Syze is able to balance the workloads amongst the logical cores by shortening the longest thread execution time. Meanwhile, it alleviates thread contention and thus elevates the parallel efficiency of multicores. Compared with well-optimized Ligra, Gemini and GPOP, Syze achieves accelerations by up to <inline-formula><tex-math notation="LaTeX">$17.76\times$</tex-math></inline-formula> , <inline-formula><tex-math notation="LaTeX">$11.67\times$</tex-math></inline-formula> and <inline-formula><tex-math notation="LaTeX">$2.81\times$</tex-math></inline-formula> respectively. Additionally, the side effects of Syze are evaluated, including raised cache misses and memory accesses. They play a trivial role in deciding the overall performance, as their costs are far outweighed by the gains from the even distribution of workloads and the improved utilization of multicores.  Author: Please confirm or add details for any funding or financial support for the research of this article. ?>

Full Text