Application-specific graph sampling for frequent subgraph mining and community detection

Sumit Purohit,Sutanay Choudhury,Lawrence B Holder

doi:10.1109/bigdata.2017.8258022

Abstract

Graph mining is an important data analysis methodology, but struggles as the input graph size increases. The scalability and usability challenges posed by such large graphs make it imperative to sample the input graph and reduce its size. The critical challenge in sampling is to identify the appropriate algorithm to insure the resulting analysis does not suffer heavily from the data reduction. Predicting the expected performance degradation for a given graph and sampling algorithm is also useful. In this paper, we present different sampling approaches for graph mining applications such as Frequent Subgrpah Mining (FSM), and Community Detection (CD). We explore graph metrics such as PageRank, Triangles, and Diversity to sample a graph and conclude that for heterogeneous graphs Triangles and Diversity perform better than degree based metrics. We also present two new sampling variations for targeted graph mining applications. We present empirical results to show that knowledge of the target application, along with input graph properties can be used to select the best sampling algorithm. We also conclude that performance degradation is an abrupt, rather than gradual phenomena, as the sample size decreases. We present the empirical results to show that the performance degradation follows a logistic function. Original Datasets, implementation of sampling algorithms, and results are available online.1

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Application-specific graph sampling for frequent subgraph mining and community detection

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Intelligent Community Detection: Review
Jenan Ahmed ... Wasan S Awad
SSRN Electronic Journal | VOL. -
Jenan Ahmed, et. al.Jenan Ahmed ... Wasan S Awad
10 Feb 2020
SSRN Electronic Journal | VOL. -

Ex-MATE: Data Intensive Computing with Large Reduction Objects and Its Application to Graph Mining
Wei Jiang ... Gagan Agrawal
-
Wei Jiang, et. al.Wei Jiang ... Gagan Agrawal
01 May 2011
01 May 2011

LF-GDPR: A Framework for Estimating Graph Metrics With Local Differential Privacy
Qingqing Ye ... Xiaofeng Meng
IEEE Transactions on Knowledge and Data Engineering | VOL. 34
Qingqing Ye, et. al.Qingqing Ye ... Xiaofeng Meng
28 Jan 2021
IEEE Transactions on Knowledge and Data Engineering | VOL. 34

LPCD: Incremental Approach for Dynamic Networks
Ashwitha Gatadi ... K Swarupa Rani
-
Ashwitha Gatadi, et. al.Ashwitha Gatadi ... K Swarupa Rani
01 Jan 2023
01 Jan 2023

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Application-specific graph sampling for frequent subgraph mining and community detection

Abstract

Talk to us

Similar Papers