Due to the inherent hardness of subgraph isomorphism, the performance is often a bottleneck in various real-world applications. We address this by designing an efficient subgraph isomorphism algorithm leveraging features of GPU architecture. Existing GPU-based solutions adopt two-step output scheme, performing the same join twice in order to write intermediate results concurrently. They also lack GPU architecture-aware optimizations that allow scaling to large graphs. In this paper, we propose a <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">S</i> calable <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">G</i> PU-friendly <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">s</i> ubgraph <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">i</i> somorphism algorithm, <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">SGSI</i> . SGSI incorporates a <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">Prealloc-Combine</i> strategy based on the vertex-oriented framework, which avoids joining-twice in existing solutions. It uses a GPU-friendly data structure (called <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">PCSR</i> ) to represent an edge-labeled graph. We also study fine-grained load balance strategies and discuss how to handle enormous graphs that cannot be resident in GPU memory. A partition-based pipeline framework is proposed. Extensive experiments on both synthetic and real graphs show that SGSI outperforms the state-of-the-art algorithms by up to several orders of magnitude and has a good scalability with graph size scaling to billions of edges.
Read full abstract