Graph pattern matching is powerful and widely applicable to many application domains. Despite the recent algorithm advances, matching patterns in large-scale real-world graphs still faces the memory access bottleneck on conventional computing systems. Processing-in-memory (PIM) is an emerging hardware architecture paradigm that puts computing cores into memory devices to alleviate the memory wall issues. Real PIM hardware has recently become commercially accessible to the public. In this work, we leverage the real PIM hardware platform to build a graph pattern matching framework, PimPam, to benefit from its abundant computation and memory bandwidth resources. We propose four key optimizations in PimPam to improve its efficiency, including (1) load-aware task assignment to ensure load balance, (2) space-efficient and parallel data partitioning to prepare input data for PIM cores, (3) adaptive multi-threading collaboration to automatically select the best parallelization strategy during processing, and (4) dynamic bitmap structures that accelerate the key operations of set intersection. When evaluated on five patterns and six real-world graphs, PimPam outperforms the state-of-the-art CPU baseline system by 22.5x on average and up to 71.7x, demonstrating significant performance improvements.
Read full abstract