Abstract
In recent years, the rapid growth of the Internet has led to creation of massively large graphs. Since databases have become very large nowadays, they cannot be processed by a simple machine at an acceptable time anymore; therefore, traditional graph partitioning methods, which are often based on having a complete image of the entire graph, are not applicable to large datasets. This challenge has led to the appearance of a new approach called streaming graph partitioning. In streaming graph partitioning, a stream of input data is received by a partitioner, and partitioner decides which computational machine the data should be transferred to. Often, streaming partitioner does not have any information about the whole graph, and usually distributes the vertices based on some greedy heuristics which may not be optimal for incoming vertices. Hence, partitioner’s decision can be significantly improved if more information about the graph is utilized. In this paper, we present a new vertex-cut streaming graph partitioning approach. The proposed method uses the idea of postponing the decision for some of the edges (by means of an intelligent buffering) and corrects some of the past decisions to improve the quality of the graph partitioning. The proposed approach is evaluated using from real-world graphs. The experimental results show that the performance of the proposed method is superior in comparison with the previous HDRF method.
Highlights
In recent years rapid development of Internet has led to the emergence of large graphs
We propose the Reassignment and Buffer based Streaming Edge Partitioning (RBSEP) that produces balanced partitions, and improves partitioning quality in terms of vertex-cut
(1) A partitioner like high degree replicated first (HDRF) assigns edges that have not been copied to any partition, to the partition with smallest number of edges only based on its balanced criterion
Summary
In recent years rapid development of Internet has led to the emergence of large graphs. Real-world graphs follow power-law distribution with few high degree vertices and many low degree vertices. It has been shown that edge partitioning can be more efficient for partitioning of power-law graphs [7, 8]. Problem statement Natural graphs have a prominent property which is their skewed power-law degree distribution. It means most of the vertices have relatively few neighbors, while a few vertices have many neighbors and the probability that a vertex has degree d is P(d) ∝ d−α (1). To formally define the k-way vertex-cut partitioning problem, we represent a graph as follows: G = ( E, V) where V is the set of vertices and E is the set of edges, the set of partitions is. Note that the input data is a random list of edges which are received and processed by partitioner in a streaming manner
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have