DHPV: a distributed algorithm for large-scale graph partitioning

Wilfried Yves Hamilton Adoni,Tarik Nahhal,Ismail Assayad,Abdeltif El Byed,Moez Krichen

doi:10.1186/s40537-020-00357-y

Wilfried Yves Hamilton Adoni, Tarik Nahhal + Show 3 more

Open Access

https://doi.org/10.1186/s40537-020-00357-y

Copy DOI

Abstract

Big graphs are part of the movement of “Not Only SQL” databases (also called NoSQL) focusing on the relationships between data, rather than the values themselves. The data is stored in vertices while the edges model the interactions or relationships between these data. They offer flexibility in handling data that is strongly connected to each other. The analysis of a big graph generally involves exploring all of its vertices. Thus, this operation is costly in time and resources because big graphs are generally composed of millions of vertices connected through billions of edges. Consequently, the graph algorithms are expansive compared to the size of the big graph, and are therefore ineffective for data exploration. Thus, partitioning the graph stands out as an efficient and less expensive alternative for exploring a big graph. This technique consists in partitioning the graph into a set of k sub-graphs in order to reduce the complexity of the queries. Nevertheless, it presents many challenges because it is an NP-complete problem. In this article, we present DPHV (Distributed Placement of Hub-Vertices) an efficient parallel and distributed heuristic for large-scale graph partitioning. An application on a real-world graphs demonstrates the feasibility and reliability of our method. The experiments carried on a 10-nodes Spark cluster proved that the proposed methodology achieves significant gain in term of time and outperforms JA-BE-JA, Greedy, DFEP.

Highlights

Graphs are ubiquitous [1] in engineering sciences because they prove to be a flexible model in the modeling of various complex phenomena emanating from various disciplines [2]: biological, sociological, economic, physical and technological
We introduce Distributed placement of hub-vertices (DPHV) (Distributed Placement of Hub-Vertices), a distributed and parallel heuristic suited for partitioning of large-scale graph according to vertex-centric paradigm and uses a monitoring agent which ensures that the weight constraints of each partitions is within normal limits
We introduced DPHV, an algorithm based on the placement of hub vertices, that is to say the vertices which have a great impact on the weight and the topology of the graph [8]

Summary

Introduction

Graphs are ubiquitous [1] in engineering sciences because they prove to be a flexible model in the modeling of various complex phenomena emanating from various disciplines [2]: biological, sociological, economic, physical and technological. A great deal of research was dedicated to improving methods of analysis for these networks [3, 4]. The effectiveness and applicability of these methods are still limited to small networks because of the complexity of exhaustive analysis [3]. The analysis of a complex network is very expansive and consumes a lot of hardware resources because of the NPcompleteness of the problem [5, 6]. With their heterogeneity allow to analyze a chaotic dynamics or represent a complex phenomenon.

Objectives

Methods

Results

Conclusion