Abstract

Large graphs are scale-free, ubiquitous having irregular relationships and non-trivial topology. Frequent subgraph mining is a popular method for knowledge extraction from graphs. Most of the existing frequent subgraph mining algorithms are centralized algorithms that cannot handle a single large graph efficiently and incur high communication cost. However, to make the task of subgraph mining less expensive computationally, approximate subgraph mining can be applied which will capture similar structure subgraphs as of exact subgraph mining. In this paper, we propose an approximate subgraph mining algorithm named Ap-FSM implemented on distributed graph environment Pregel. The working of Ap-FSM is divided into three phases. The first phase selects the representative graph from the original graph while preserving the original graph properties. The second phase efficiently performs subgraph extension. Phase 3 introduces a novel two-step optimization for performing subgraph pruning. Analyzing such large graph data will be beneficial from the perspective of expert and intelligent systems, as discovered patterns can be used for knowledge discovery and decision making. To evaluate the performance of Ap-FSM, experiments are performed over three real life datasets having up to billion edges. The results show that the proposed Ap-FSM significantly outperforms the state-of-art frequent subgraph mining algorithms and overcome the challenges of performing frequent subgraph mining on a massive large graph. It is also shown that Ap-FSM achieves high scalability and speedup in distributed graph environment and is highly accurate in finding frequent subgraphs from a single large graph.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.