Limited random walk algorithm for big graph data clustering

Honglei Zhang,Moncef Gabbouj,Jenni Raitoharju,Serkan Kiranyaz

doi:10.1186/s40537-016-0060-5

Abstract

Graph clustering is an important technique to understand the relationships between the vertices in a big graph. In this paper, we propose a novel random-walk-based graph clustering method. The proposed method restricts the reach of the walking agent using an inflation function and a normalization function. We analyze the behavior of the limited random walk procedure and propose a novel algorithm for both global and local graph clustering problems. Previous random-walk-based algorithms depend on the chosen fitness function to find the clusters around a seed vertex. The proposed algorithm tackles the problem in an entirely different manner. We use the limited random walk procedure to find attractor vertices in a graph and use them as features to cluster the vertices. According to the experimental results on the simulated graph data and the real-world big graph data, the proposed method is superior to the state-of-the-art methods in solving graph clustering problems. Since the proposed method uses the embarrassingly parallel paradigm, it can be efficiently implemented and embedded in any parallel computing environment such as a MapReduce framework. Given enough computing resources, we are capable of clustering graphs with millions of vertices and hundreds millions of edges in a reasonable time.

Highlights

Graph data are important data types in many scientific areas, such as social network analysis, bioinformatics, and computer and information network analysis [1]
We analyze the behavior of the limited random walk procedure and propose a novel algorithm for both global and local graph clustering problems
We propose a novel random-walk-based graph clustering algorithm—the limited random walk (LRW) algorithm

Summary

Background

Graph data are important data types in many scientific areas, such as social network analysis, bioinformatics, and computer and information network analysis [1]. Graph clustering ( named as “community detection” in the literature) algorithms aim to reveal the heterogeneity and find the underlying relations between vertices [2] This technique is critical for understanding the properties, predicting dynamic behavior and improving visualization of big graph data. Newman defined a modularity measurement based on the probability of the link between any two vertices He applied a greedy search method to minimize this modularity fitness function in order to partition a graph into clusters [5]. The accuracy of any criteria-based clustering method (or those combined with the random walk procedures) is greatly affected by the chosen clustering fitness function. Most local clustering algorithms use the criteria that are more suitable for the global graph clustering problem These choices greatly degrade the performance of these algorithms when the graph is big and highly uneven. The rest of the paper is organized as follows: basics of random walk procedure and the proposed LRW algorithm are explained in "Methodology" section; an extensive set of experiments on the simulated and real graph data, along with both numerical and visual evaluations are given in "Experiments" section; the conclusions and future work are discussed in "Conclusions" section

Methodology

Conclusions

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Journal of Big Data	Publication Date: Dec 1, 2016
Citations: 18	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Limited random walk algorithm for big graph data clustering

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of Big Data

Lead the way for us

Similar Papers

Discovering topological patterns in time-series big graph
Surekha S Wale ... Sidheshwar A Khuba
-
Surekha S Wale, et. al.Surekha S Wale ... Sidheshwar A Khuba
01 Dec 2015
01 Dec 2015

Spectrum-preserving sparsification for visualization of big graphs
Martin Imre ... Chaoli Wang
Computers & Graphics | VOL. 87
Martin Imre, et. al.Martin Imre ... Chaoli Wang
19 Feb 2020
Computers & Graphics | VOL. 87

Big data machine learning and graph analytics: Current state and future challenges
H. Howie Huang ... Hang Liu
-
H. Howie Huang, et. al.H. Howie Huang ... Hang Liu
01 Oct 2014
01 Oct 2014

Clustering Large Attributed Graphs
Hong Cheng ... Jeffrey Xu Yu
ACM Transactions on Knowledge Discovery from Data | VOL. 5
Hong Cheng, et. al.Hong Cheng ... Jeffrey Xu Yu
01 Feb 2011
ACM Transactions on Knowledge Discovery from Data | VOL. 5

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Limited random walk algorithm for big graph data clustering

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of Big Data