Abstract

Graph data mining has been a crucial as well as inevitable area of research. Large amounts of graph data are produced in many areas, such as Bioinformatics, Cheminformatics, Social Networks, etc. Scalable graph data mining methods are getting increasingly popular and necessary due to increased graph complexities. Frequent subgraph mining is one such area where the task is to find overly recurring patterns/subgraphs. To tackle this problem, many main memory-based methods were proposed, which proved to be inefficient as the data size grew exponentially over time. In the past few years, several research groups have attempted to handle the Frequent Subgraph Mining (FSM) problem in multiple ways. Many authors have tried to achieve better performance using Graphic Processing Units (GPUs) which has multi-fold improvement over in-memory while dealing with large datasets. Later, Google's MapReduce model with the Hadoop framework proved to be a major breakthrough in high performance large batch processing. Although MapReduce came with many benefits, its disk I/O and noniterative style model could not help much for FSM domain since subgraph mining process is an iterative approach. In recent years, Spark has emerged to be the De Facto industry standard with its distributed in-memory computing capability. This is a right fit solution for iterative style of programming as well. In this survey, we cover how high-performance computing has helped in improving the performance tremendously in the transactional directed and undirected aspect of graphs and performance comparisons of various FSM techniques are done based on experimental results.

Highlights

  • Frequent pattern mining has become one of the major research areas since the appearance of the seminal paper[1] published by Agrawal and Srikant on item sets

  • We found the disk I/O and non-iterative style of computing of Object-Oriented approach to Frequent SubGraph Mining (OO-FSG)[24] and MRFSM[25] were the major drawbacks and this provided us insight to apply the distributed in-memory Spark engine

  • Jena et al.: High Performance Frequent Subgraph Mining on Transaction Datasets: A Survey and : : : Definition 1 (Graph) A graph is defined as an ordered pair G D .V; E/

Read more

Summary

Introduction

Big Data Mining and Analytics, September 2019, 2(3): 159–180 size graphs, and the second category belongs to single graphs where the dataset contains a single large graph. Paper Organization: The paper is organized as follows: Section 2 presents definitions related to FSM and surveys pioneering works in the area of FSM for transactional graphs It covers memory-based single machine techniques (Apriori-based methods and pattern growth approaches), disk-based techniques (partitionbased approach, traditional database approach, and parallel and distributed approach), and distributed in-memory approaches. Jena et al.: High Performance Frequent Subgraph Mining on Transaction Datasets: A Survey and : : : Definition 1 (Graph) A graph is defined as an ordered pair G D .V; E/. PATH[7] and FSG[41] algorithms were developed Another group of researchers used a non-apriori-based approach [Mofa, gSpan, FFSM, GASTON] where the subgraphs were extended by adding a single edge each time.

Memory-based single machine techniques
Pattern growth approach
Disk-based techniques
Partition-based approach
Parallel and distributed approach
Distributed in-memory techniques
7: Fk k w
Subgraph construction
Details of MapReduce-FSG
Experimental details
D A:1 B C:3 E:2
Findings
Concluding Remarks
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.