Distributed Processing System Research Articles

Querying graphs and conducting graph analytics become important in data processing since many real applications are dealing with massive graphs, such as online social networks, Semantic Web, knowledge graphs, etc. Over the years, many distributed graph processing systems have been developed to support graph analytics using various programming models, and many graph querying languages have been proposed. A natural question that arises is how to integrate graph data and traditional non-graph data in a distributed system for users to conduct analytics. There are two issues. One issue is related to expressiveness on how to specify graph analytics as well as data analytics by a querying language. The other issue is related to efficiency on how to process analytics in a distributed system. For the first issue, SQL is a best candidate, since SQL is a well-accepted language for data processing. We concentrate on SQL for graph analytics. Our early work shows that graph analytics can be supported by SQL in a way from “semiring + while” to “relational algebra + while” via the enhanced recursive SQL queries. In this article, we focus on the second issue on how to process such enhanced recursive SQL queries based on the GAS ( Gather - Apply - Scatter ) model under which efficient graph processing systems can be developed. To demonstrate the efficiency, we implemented a system by tightly coupling Spark SQL and GraphX on Spark which is one of the most popular in-memory data-flow processing platforms. First, we enhance Spark SQL by adding the capability of supporting the enhanced recursive SQL queries for graph analytics. In this regard, graph analytics can be processed using a distributed SQL engine alone. Second, we further propose new transformation rules to optimize/translate the operations for recursive SQL queries to the operations by GraphX . In this regard, graph analytics by SQL can be processed in a similar way as done by a distributed graph processing system using the APIs provided by the system. We conduct extensive performance studies to test graph analytics using large real graphs. We show that our approach can achieve similar or even higher efficiency, in comparison to the built-in graph algorithms in the existing graph processing systems.

Read full abstract

Natural graphs with skewed distributions raise unique challenges to distributed graph computation and partitioning. Existing graph-parallel systems usually use a “one-size-fits-all” design that uniformly processes all vertices, which either suffer from notable load imbalance and high contention for high-degree vertices (e.g., Pregel and GraphLab) or incur high communication cost and memory consumption even for low-degree vertices (e.g., PowerGraph and GraphX). In this article, we argue that skewed distributions in natural graphs also necessitate differentiated processing on high-degree and low-degree vertices. We then introduce PowerLyra, a new distributed graph processing system that embraces the best of both worlds of existing graph-parallel systems. Specifically, PowerLyra uses centralized computation for low-degree vertices to avoid frequent communications and distributes the computation for high-degree vertices to balance workloads. PowerLyra further provides an efficient hybrid graph partitioning algorithm (i.e., hybrid-cut) that combines edge-cut (for low-degree vertices) and vertex-cut (for high-degree vertices) with heuristics. To improve cache locality of inter-node graph accesses, PowerLyra further provides a locality-conscious data layout optimization. PowerLyra is implemented based on the latest GraphLab and can seamlessly support various graph algorithms running in both synchronous and asynchronous execution modes. A detailed evaluation on three clusters using various graph-analytics and MLDM (Machine Learning and Data Mining) applications shows that PowerLyra outperforms PowerGraph by up to 5.53X (from 1.24X) and 3.26X (from 1.49X) for real-world and synthetic graphs, respectively, and is much faster than other systems like GraphX and Giraph, yet with much less memory consumption. A porting of hybrid-cut to GraphX further confirms the efficiency and generality of PowerLyra.

Read full abstract

Distributed Processing System Research Articles

Related Topics

Articles published on Distributed Processing System

Fast Failure Recovery in Vertex-Centric Distributed Graph Processing Systems

Minimizing cost by reducing scaling operations in distributed stream processing

Property-Based Testing for Spark Streaming

SPFC: An Effective Optimization for Vertex-Centric Graph Processing Systems

SQL-G: Efficient Graph Analytics by SQL

Design of the Intelligent LBS Service : Using Big Data Distributed Processing System

Throughput Scalability Analysis of Fork-Join Queueing Networks

Simulation modelling of the heterogeneous distributed information processing systems

<u>S</u>tart <u>l</u>ate or <u>f</u>inish <u>e</u>arly

PowerLyra

Profiling distributed graph processing systems through visual analytics

Multicriterion problem of allocation of resources in the heterogeneous distributed information processing systems

Survey of Apache Storm Scheduler

Realizing Memory-Optimized Distributed Graph Processing

Lazygraph

High-Level Programming Abstractions for Distributed Graph Processing

GraphDuo: A Dual-Model Graph Processing Framework

Design of GlusterFS Based Big Data Distributed Processing System in Smart Factory

Emergence of anti-coordination through reinforcement learning in generalized minority games

Distributed stream processing system for join operations

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Distributed Processing System Research Articles

Related Topics

Articles published on Distributed Processing System

Fast Failure Recovery in Vertex-Centric Distributed Graph Processing Systems

Minimizing cost by reducing scaling operations in distributed stream processing

Property-Based Testing for Spark Streaming

SPFC: An Effective Optimization for Vertex-Centric Graph Processing Systems

SQL-G: Efficient Graph Analytics by SQL

Design of the Intelligent LBS Service : Using Big Data Distributed Processing System

Throughput Scalability Analysis of Fork-Join Queueing Networks

Simulation modelling of the heterogeneous distributed information processing systems

&lt;u&gt;S&lt;/u&gt;tart &lt;u&gt;l&lt;/u&gt;ate or &lt;u&gt;f&lt;/u&gt;inish &lt;u&gt;e&lt;/u&gt;arly

PowerLyra

Profiling distributed graph processing systems through visual analytics

Multicriterion problem of allocation of resources in the heterogeneous distributed information processing systems

Survey of Apache Storm Scheduler

Realizing Memory-Optimized Distributed Graph Processing

Lazygraph

High-Level Programming Abstractions for Distributed Graph Processing

GraphDuo: A Dual-Model Graph Processing Framework

Design of GlusterFS Based Big Data Distributed Processing System in Smart Factory

Emergence of anti-coordination through reinforcement learning in generalized minority games

Distributed stream processing system for join operations

<u>S</u>tart <u>l</u>ate or <u>f</u>inish <u>e</u>arly