High performance frequent subgraph mining on transaction datasets: A survey and performance comparison

Bismita S Jena,Cynthia Khan,Rajshekhar Sunderraman

doi:10.26599/bdma.2019.9020006

Bismita S Jena, Cynthia Khan + Show 1 more

Open Access

https://doi.org/10.26599/bdma.2019.9020006

Copy DOI

Journal: Big Data Mining and Analytics	Publication Date: Sep 1, 2019
Citations: 6	License type: cc-by

Affiliation: Georgia State University

Abstract

Graph data mining has been a crucial as well as inevitable area of research. Large amounts of graph data are produced in many areas, such as Bioinformatics, Cheminformatics, Social Networks, etc. Scalable graph data mining methods are getting increasingly popular and necessary due to increased graph complexities. Frequent subgraph mining is one such area where the task is to find overly recurring patterns/subgraphs. To tackle this problem, many main memory-based methods were proposed, which proved to be inefficient as the data size grew exponentially over time. In the past few years, several research groups have attempted to handle the Frequent Subgraph Mining (FSM) problem in multiple ways. Many authors have tried to achieve better performance using Graphic Processing Units (GPUs) which has multi-fold improvement over in-memory while dealing with large datasets. Later, Google's MapReduce model with the Hadoop framework proved to be a major breakthrough in high performance large batch processing. Although MapReduce came with many benefits, its disk I/O and noniterative style model could not help much for FSM domain since subgraph mining process is an iterative approach. In recent years, Spark has emerged to be the De Facto industry standard with its distributed in-memory computing capability. This is a right fit solution for iterative style of programming as well. In this survey, we cover how high-performance computing has helped in improving the performance tremendously in the transactional directed and undirected aspect of graphs and performance comparisons of various FSM techniques are done based on experimental results.

Highlights

Frequent pattern mining has become one of the major research areas since the appearance of the seminal paper[1] published by Agrawal and Srikant on item sets
We found the disk I/O and non-iterative style of computing of Object-Oriented approach to Frequent SubGraph Mining (OO-FSG)[24] and MRFSM[25] were the major drawbacks and this provided us insight to apply the distributed in-memory Spark engine
Jena et al.: High Performance Frequent Subgraph Mining on Transaction Datasets: A Survey and : : : Definition 1 (Graph) A graph is defined as an ordered pair G D .V; E/

Summary

Introduction

Big Data Mining and Analytics, September 2019, 2(3): 159–180 size graphs, and the second category belongs to single graphs where the dataset contains a single large graph. Paper Organization: The paper is organized as follows: Section 2 presents definitions related to FSM and surveys pioneering works in the area of FSM for transactional graphs It covers memory-based single machine techniques (Apriori-based methods and pattern growth approaches), disk-based techniques (partitionbased approach, traditional database approach, and parallel and distributed approach), and distributed in-memory approaches. Jena et al.: High Performance Frequent Subgraph Mining on Transaction Datasets: A Survey and : : : Definition 1 (Graph) A graph is defined as an ordered pair G D .V; E/. PATH[7] and FSG[41] algorithms were developed Another group of researchers used a non-apriori-based approach [Mofa, gSpan, FFSM, GASTON] where the subgraphs were extended by adding a single edge each time.

Memory-based single machine techniques

Pattern growth approach

Disk-based techniques

Partition-based approach

Parallel and distributed approach

Distributed in-memory techniques

7: Fk k w

Subgraph construction

Details of MapReduce-FSG

Experimental details

D A:1 B C:3 E:2

Findings

Concluding Remarks

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

High performance frequent subgraph mining on transaction datasets: A survey and performance comparison

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Big Data Mining and Analytics

Lead the way for us

Similar Papers

An Efficient Ranking Scheme for Frequent Subgraph Patterns
Saif Ur Rehman ... Sohail Asghar
-
Saif Ur Rehman, et. al.Saif Ur Rehman ... Sohail Asghar
26 Feb 2018
26 Feb 2018

A Graph Mining Approach for Ranking and Discovering the Interesting Frequent Subgraph Patterns
Saif Ur Rehman ... Kexing Liu
International Journal of Computational Intelligence Systems | VOL. 14
Saif Ur Rehman, et. al.Saif Ur Rehman ... Kexing Liu
04 Aug 2021
International Journal of Computational Intelligence Systems | VOL. 14

MOSubdue: a Pareto dominance-based multiobjective Subdue algorithm for frequent subgraph mining
Prakash Shelokar ... Óscar Cordón
Knowledge and Information Systems | VOL. 34
Prakash Shelokar, et. al.Prakash Shelokar ... Óscar Cordón
17 Nov 2011
Knowledge and Information Systems | VOL. 34

Ap-FSM: A parallel algorithm for approximate frequent subgraph mining using Pregel
Vandana Bhatia ... Rinkle Rani
Expert Systems With Applications | VOL. 106
Vandana Bhatia, et. al.Vandana Bhatia ... Rinkle Rani
09 Apr 2018
Expert Systems With Applications | VOL. 106

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

High performance frequent subgraph mining on transaction datasets: A survey and performance comparison

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Big Data Mining and Analytics