Abstract

Many join-based graph mining (JGM) algorithms, such as triangle listing and clique enumeration, typically output data of such a large size that it often dominates the mining cost. Despite the significance of the I/O cost, only a few research focused on reducing the size of the output data. However, those techniques have limitation that they are highly specific to their corresponding graph mining algorithms. In this paper, through careful observations of output patterns, we propose a general compression solution that can be applied to an arbitrary JGM algorithm. The proposed technique first categorizes the nodes of a set of cliques into two lists, which contain overlapping and non-overlapping nodes, respectively. Then, we remove the redundant nodes from multiple sets of cliques by creating a Clique Constructor Tree using the overlapping and non-overlapping nodes. Finally, we compress the clique constructor trees by applying a novel flag-aligned word encoding technique. Our proposed technique performs compression on-the-fly and can easily be adopted by various JGM algorithms. Experiments on real datasets show that our proposed technique, which is adopted in a triangle listing algorithm, reduces the size of the output data and running time by seven times and about four times, respectively. Experimental results also show that a maximal clique listing algorithm can reduce the size of the output data by a factor of three.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call