Enhanced subgraph matching for large graphs using candidate region-based decomposition and ordering

Zubair Ali Ansari,Md Aslam Parwez,Irfan Rashid Thoker,Jahiruddin Jahiruddin

doi:10.1016/j.jksuci.2023.101694

Abstract

The subgraph matching problem associated with large graphs is an emerging research challenge in graph search due to the growing size of the web, social, and metabolic graphs, and the wide availability of graph databases. Such problems involve finding all instances (aka embedding) of the small-sized query graph in the associated large-sized reference graph. Many state-of-the-art algorithms, including VF3, RI, CFL-Match, and Glasgow, exist to solve subgraph matching problem. RI is one of the fastest subgraph matching algorithms focusing mainly on time efficiency performance measures. However, other performance measures, such as the number of found instances of the query graph (embedding count), the method of ordering the query graph’s vertices, and the number of recursive calls, are crucial for the efficiency and effectiveness of the subgraph matching. In this paper, the RI+ algorithm is proposed as an enhanced version of RI, which has been designed using candidate region-based decomposition and ordering. Three novel candidate region orderings have been introduced, namely vertex-count, density, and average-path-length, based on the structural properties of the candidate regions. On empirical analysis of RI+ on real-world data sets, it was observed that RI+ shows significant improvement in efficiency and effectiveness over RI on both performance evaluation measures, namely, embedding count and search time. The influence of the proposed candidate region orderings on the search time of RI+ was also analyzed, revealing that a suitable candidate region ordering has the potential to improve the search time of the proposed algorithm.

Full Text