Resling: a scalable and generic framework to mine top-k representative subgraph patterns

Dheepikaa Natarajan,Sayan Ranu

doi:10.1007/s10115-017-1129-y

Abstract

Mining subgraph patterns is an active area of research due to its wide-ranging applications. Examples include frequent subgraph mining, discriminative subgraph mining, statistically significant subgraphs. Existing research has primarily focused on mining all subgraph patterns in the database. However, due to the exponential subgraph search space, the number of patterns mined, typically, is too large for any human-mediated analysis. Consequently, deriving insights from the mined patterns is hard for domain scientists. In addition, subgraph pattern mining is posed in multiple forms: the function that models if a subgraph is a pattern varies based on the application and the database could be over multiple graphs or a single, large graph. In this paper, we ask the following question: Given a subgraph importance function and a budget k, which are the k subgraph patterns that best represent all other patterns of interest? We show that the problem is NP-hard, and propose a generic framework called Resling that adapts to arbitrary subgraph importance functions and generalizable to both transactional graph databases as well as single, large graphs. Resling derives its power by structuring the search space in the form of an edit map, where each subgraph is a node, and two subgraphs are connected if they have an edit distance of one. We rank nodes in the edit map through two random walk based algorithms: vertex-reinforced random walks ( Resling -VR) and negative-reinforced random walks( Resling -NR). Experiments show that Resling-VR is up to 20 times more representative of the pattern space and two orders of magnitude faster than the state-of-the-art techniques. Resling-NR further improves the running time while maintaining comparable or better performance in representative power.

Full Text