KTMiner: Distributed k-truss detection in big graphs

Mehdi Alemi,Hassan Haghighi

doi:10.1016/j.is.2019.03.014

Abstract

Discovering cohesive subgraphs is an important issue in the analysis of massive graphs. A new type of cohesive subgraphs, k-truss, has gained a lot of attentions in recent years. Although different methods have been proposed to extract k-truss subgraphs, they are unable to deal with big graphs, and they suffer from the lack of efficiency. To deal with the problem of finding k-truss subgraphs in big graphs, we propose a novel efficient distributed algorithm, called KTMiner, based on the MapReduce paradigm and the key–value structure. KTMiner is deployed on the Spark platform, a big data framework. Given a specific k value, KTMiner finds edges belonging to the k-truss subgraphs. It includes three consequent phases. First, a novel distributed k-core routine is employed to prune unnecessary vertices from the input graph. Then, the triangle information per edge in the form of a new data structure, called Triangle Set (TSet), is produced. Finally, using a distributed iterative procedure, the desired k-truss subgraphs are detected. KTMiner perfectly caches the reusable data in the distributed memory. In addition, it forms an efficient load balancing mechanism by designing appropriate data structures. This results in a fine-grained parallelism. The superiority of our solution over the state-of-the-art methods is shown by the experiments on real-world graphs.

Full Text