Keyword search has been widely studied to retrieve relevant substructures from graphs for a given set of keywords. However, existing well-studied approaches aim at finding compact trees/subgraphs containing the keywords, and ignore a critical measure, density, to represent how strongly and stably the keyword nodes are connected in the substructure. In this paper, given a set of keywords <inline-formula><tex-math notation="LaTeX">$Q = \lbrace w_1, w_2, \ldots, w_l\rbrace$</tex-math></inline-formula> , we study the problem of finding a cohesive subgraph containing <inline-formula><tex-math notation="LaTeX">$Q$</tex-math></inline-formula> with high density and compactness from a graph <inline-formula><tex-math notation="LaTeX">$G$</tex-math></inline-formula> . We model the cohesive subgraph based on a carefully chosen <inline-formula><tex-math notation="LaTeX">$k$</tex-math></inline-formula> -truss model, and formulate the problem of finding cohesive subgraphs for keyword queries as <i>minimal dense truss</i> search problem, i.e., finding minimal subgraph that maximizes the trussness covering <inline-formula><tex-math notation="LaTeX">$Q$</tex-math></inline-formula> . However, unlike <inline-formula><tex-math notation="LaTeX">$k$</tex-math></inline-formula> -truss based community search that can be efficiently done based on the local search from a given set of nodes, <i>minimal dense truss</i> search for keyword queries is a nontrivial task as the subset of keyword nodes to be included in the retrieved substructure is previously unknown. To tackle this problem, we first design a novel hybrid KT-Index to keep the keyword and truss information compacly, and then propose an efficient algorithm that carries the search on KT-Index directly to find the dense truss with the maximum trussness <inline-formula><tex-math notation="LaTeX">$G_{den}$</tex-math></inline-formula> without repeated accesses to the original graph. Then, we develop a novel refinement approach to extract minimal dense truss from the dense truss <inline-formula><tex-math notation="LaTeX">$G_{den}$</tex-math></inline-formula> , by checking each node at most once based on the anti-monotonicity property derived from <inline-formula><tex-math notation="LaTeX">$k$</tex-math></inline-formula> -truss, together with several optimization strategies including batch based deletion, early-stop based deletion, and local exploration. Moreover, we also extend the proposed method to deal with the top- <inline-formula><tex-math notation="LaTeX">$r$</tex-math></inline-formula> search. Extensive experimental studies on real-world networks validated the effectiveness and efficiency of our approaches.
Read full abstract