Query-Sensitive Graph Partitioner for Pattern Matching Applications

Li Lu,Bei Hua

doi:10.1109/access.2019.2960868

Abstract

Searching and mining in large graphs is critical to a variety of applications, at the core of which is the pattern matching activity. The scalable processing of large graphs requires careful distribution of graphs across clusters. Graph partitioning is the technique that divides a big graph into several non-overlapped subgraphs and assigns each subgraph to a compute node. Traditional workload agnostic partitioners aim to minimize the number of inter-partition edges using only graph topology, which, however, may not obtain the best solution if the workload exhibits skew. Some workload-aware partitioners choose to mine information from a specific workload and use it to minimize the number of inter-partition traversals during execution; however, their methods are not suitable for pattern matching applications. In this work, we propose a query-sensitive graph partitioner that aims to improve existing partitioning for a given pattern matching workload. The partitioner takes any initial partitioning as a starting point and iteratively adjusts it by exchanging chosen clusters across partitions, heuristically reducing the probability of inter-partition traversals. We determine a few implementation-irrelative factors that may increase the traversal probability of an edge and quantify them into a calculable indicator with information from query patterns and graph topology. Then, we propose an efficient algorithm to calculate the indicator and implement a graph repartitioner by combining the indicator with a greedy cluster-exchanging mechanism. Finally, we generate a large heterogeneous labeled graph with real-world data crawled from the Netease Music website and evaluate the partitioning quality of our repartitioner with a few meaningful query patterns of common topologies including line, loop and branching. Compared with a hash-based partitioning, our system can reduce the inter-partition traversals by at least 70%. Compared with the state-of-the-art graph partitioner Metis , our repartitioner can reduce the inter-partition traversals by at least 50%.

Highlights

Modern big data increasingly appear in the form of large heterogeneous labeled graphs
We propose a simple heuristics to estimate the traversal probability of each edge by combining the topological information from graph and patterns and design an efficient algorithm to calculate them
We propose a simple greedy algorithm to compute the exchanging cluster of each partition so that the global inter-partition traversal probability would decrease if these clusters are exchanged between partitions

Summary

INTRODUCTION

Modern big data increasingly appear in the form of large heterogeneous labeled graphs. FENNEL [7] overcomes the high computing complexity of the traditional k-balanced graph partitioning problem by relaxing the hard cardinality constraints This method provides a unifying framework that accommodates many of the previously proposed heuristics as special cases. TAPER is the most relevant work to this paper, i.e., improving an existing graph partitioning for a set of query patterns without use of historical trace or log. The specific contributions of this work are as follows: (1) First, we determine a few implementation-irrelative factors that influence the edge traversal probability (ETP) in pattern matching and develop a heuristic formula to estimate it with information from query patterns and graph topology.

RELATED WORKS

INDICATOR OF EDGE TRAVERSAL PROBABILITY

CALCULATING EDGE TRAVERSAL PROBABILITY

COMPUTING EXCHANGING CLUSTERS

CONCLUSION

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE Access	Publication Date: Jan 1, 2019
Citations: 10	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Query-Sensitive Graph Partitioner for Pattern Matching Applications

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access

Lead the way for us

Similar Papers

TAPER: query-aware, partition-enhancement for large, heterogenous graphs
Hugo Firth ... Paolo Missier
Distributed and Parallel Databases | VOL. 35
Hugo Firth, et. al.Hugo Firth ... Paolo Missier
02 May 2017
Distributed and Parallel Databases | VOL. 35

Parallel and External High Quality Graph Partitioning

-

01 Jan 2019
01 Jan 2019

Efficient data partitioning model for heterogeneous graphs in the cloud
Kisung Lee ... Ling Liu
-
Kisung Lee, et. al.Kisung Lee ... Ling Liu
17 Nov 2013
17 Nov 2013

Improving large graph processing on partitioned graphs in the cloud
Rishan Chen ... Byron Choi
-
Rishan Chen, et. al.Rishan Chen ... Byron Choi
14 Oct 2012
14 Oct 2012

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Query-Sensitive Graph Partitioner for Pattern Matching Applications

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access