An analysis of the graph processing landscape

Miguel E Coimbra,Alexandre P Francisco,Luís Veiga

doi:10.1186/s40537-021-00443-9

Abstract

The value of graph-based big data can be unlocked by exploring the topology and metrics of the networks they represent, and the computational approaches to this exploration take on many forms. For the use-case of performing global computations over a graph, it is first ingested into a graph processing system from one of many digital representations. Extracting information from graphs involves processing all their elements globally, which can be done with single-machine systems (with varying approaches to hardware usage), distributed systems (either homogeneous or heterogeneous groups of machines) and systems dedicated to high-performance computing (HPC). For these systems focused on processing the bulk of graph elements, common use-cases consist in executing for example algorithms for vertex ranking or community detection, which produce insights on graph structure and relevance of their elements. Many distributed systems (such as Flink, Spark) and libraries (e.g. Gelly, GraphX) have been built to enable these tasks and improve performance. This is achieved with techniques ranging from classic load balancing (often geared to reduce communication overhead) to exploring trade-offs between delaying computation and relaxing accuracy. In this survey we firstly familiarize the reader with common graph datasets and applications in the world of today. We provide an overview of different aspects of the graph processing landscape and describe classes of systems based on a set of dimensions we describe. The dimensions we detail encompass paradigms to express graph processing, different types of systems to use, coordination and communication models in distributed graph processing, partitioning techniques and different definitions related to the potential for a graph to be updated. This survey is aimed at both the experienced software engineer or researcher as well as the graduate student looking for an understanding of the landscape of solutions (and their limitations) for graph processing.

Highlights

Graph-based data is found almost everywhere, with examples such as analysing the structure of the World Wide Web [1,2,3], bio-informatics data representation via de Bruijn graphs [4] in metagenomics [5, 6], atoms and covalent relationships in chemistry [7], the structure of distributed computation itself [8], massive parallel learning of Coimbra et al J Big Data (2021) 8:55 tree ensembles [9] and parallel topic models [10]
To achieve parallelism and harness multiple machines in clusters, it is necessary to define how to break down the graph—we provide a high-level overview of methods employed in most well-known graph processing solutions
We explore partitioning as a relevant dimension to classify graph processing systems as they must approach it in order to enable parallel computation over graphs

Summary

Introduction

Graph-based data is found almost everywhere, with examples such as analysing the structure of the World Wide Web [1,2,3], bio-informatics data representation via de Bruijn graphs [4] in metagenomics [5, 6], atoms and covalent relationships in chemistry [7], the structure of distributed computation itself [8], massive parallel learning of Coimbra et al J Big Data (2021) 8:55 tree ensembles [9] and parallel topic models [10]. Dimension: Partitioning presents the most-known approaches to decomposing graph-based data into computational units for parallelism and distribution, showcasing models with different levels of granularity.

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Journal of Big Data	Publication Date: Apr 9, 2021
Citations: 13	License type: open-access

R Discovery Prime

R Discovery Prime

An analysis of the graph processing landscape

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of Big Data

Lead the way for us

Similar Papers

Realizing Memory-Optimized Distributed Graph Processing
Panagiotis Liakos ... Katia Papakonstantinopoulou
IEEE Transactions on Knowledge and Data Engineering | VOL. 30
Panagiotis Liakos, et. al.Panagiotis Liakos ... Katia Papakonstantinopoulou
01 Apr 2018
IEEE Transactions on Knowledge and Data Engineering | VOL. 30

RGraph: Asynchronous graph processing based on asymmetry of remote direct memory access
Hanhua Chen ... Hai Jin
Software: Practice and Experience | VOL. 52
Hanhua Chen, et. al.Hanhua Chen ... Hai Jin
26 Apr 2021
Software: Practice and Experience | VOL. 52

UniGPS: A Unified Programming Framework for Distributed Graph Processing
Zhaokang Wang ... Yihua Huang
-
Zhaokang Wang, et. al.Zhaokang Wang ... Yihua Huang
01 Dec 2021
01 Dec 2021

Memory-Optimized Distributed Graph Processing through Novel Compression Techniques
Panagiotis Liakos ... Katia Papakonstantinopoulou
-
Panagiotis Liakos, et. al.Panagiotis Liakos ... Katia Papakonstantinopoulou
24 Oct 2016
24 Oct 2016

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

An analysis of the graph processing landscape

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of Big Data