Placing big graph into cloud for parallel processing with a two-phase community-aware approach

Kekun Hu,Guosun Zeng

doi:10.1016/j.future.2019.07.014

Abstract

Abstract Big graphs are so large that their analysis often rely on the cloud for parallel processing. Data placement, as a key pre-processing step, has a profound impact on the performance of parallel processing. Traditional placement methods fail to preserve graph topologies, leading to poor performance. As the community is the most common structure of big graphs, in this work, we present a two-phase community-aware placement algorithm to place big graphs into the cloud for parallel processing. It can obtain a placement scheme that preserves the community structure well by maximizing the modularity density of the scheme under memory capacity constraints of computational nodes of the cloud in two phases. In the first phase, we design a streaming partitioning heuristic to detect communities based on partial and incomplete graph information. They form an initial placement scheme with relatively high modularity density. To improve it further, in the second phase, we put forward a scale-constrained kernel k-means algorithm. It takes as input the initial placement scheme and iteratively redistributes graph vertices across computational nodes under scale constraints until the modularity density cannot be improved any further. Finally, experiments show that our algorithm can preserve graph topologies well and greatly support parallel processing of big graphs in the cloud.

Full Text