An asynchronous traversal engine for graph-based rich metadata management

Dong Dai,Philip Carns,Yong Chen,Robert B Ross,John Jenkins,Nicholas Muirhead

doi:10.1016/j.parco.2016.06.002

Dong Dai, Philip Carns + Show 4 more

Open Access

https://doi.org/10.1016/j.parco.2016.06.002

Copy DOI

Journal: Parallel Computing	Publication Date: Jun 23, 2016
Citations: 3	License type: publisher-specific-oa

Affiliation: Argonne National Laboratory, Texas Tech University

Abstract

Rich metadata in high-performance computing (HPC) systems contains extended information about users, jobs, data files, and their relationships. Property graphs are a promising data model to represent heterogeneous rich metadata flexibly. Specifically, a property graph can use vertices to represent different entities and edges to record the relationships between vertices with unique annotations. The high-volume HPC use case, with millions of entities and relationships, naturally requires an out-of-core distributed property graph database, which must support live updates (to ingest production information in real time), low-latency point queries (for frequent metadata operations such as permission checking), and large-scale traversals (for provenance data mining).Among these needs, large-scale property graph traversals are particularly challenging for distributed graph storage systems. Most existing graph systems implement a level-synchronous breadth-first search algorithm that relies on global synchronization in each traversal step. This performs well in many problem domains; but a rich metadata management system is characterized by imbalanced graphs, long traversal lengths, and concurrent workloads, each of which has the potential to introduce or exacerbate stragglers (i.e., abnormally slow steps or servers in a graph traversal) that lead to low overall throughput for synchronous traversal algorithms. Previous research indicated that the straggler problem can be mitigated by using asynchronous traversal algorithms, and many graph-processing frameworks have successfully demonstrated this approach. Such systems require the graph to be loaded into a separate batch-processing framework instead of being iteratively accessed, however.In this work, we investigate a general asynchronous graph traversal engine that can operate atop a rich metadata graph in its native format. We outline a traversal-aware query language and key optimizations (traversal-affiliate caching and execution merging) necessary for efficient performance. We further explore the effect of different graph partitioning strategies on the traversal performance for both synchronous and asynchronous traversal engines. Our experiments show that the asynchronous graph traversal engine is more efficient than its synchronous counterpart in the case of HPC rich metadata processing, where more servers are involved and larger traversals are needed. Moreover, the asynchronous traversal engine is more adaptive to different graph partitioning strategies.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

An asynchronous traversal engine for graph-based rich metadata management

Abstract

Talk to us

Similar Papers

More From: Parallel Computing

Lead the way for us

Similar Papers

GraphTrek: Asynchronous Graph Traversal for Property Graph-Based Metadata Management
Dong Dai ... Philip Carns
-
Dong Dai, et. al.Dong Dai ... Philip Carns
01 Sep 2015
01 Sep 2015

GRAM: A GPU-Based Property Graph Traversal and Query for HPC Rich Metadata Management
Wenke Li ... Xuanhua Shi
-
Wenke Li, et. al.Wenke Li ... Xuanhua Shi
01 Jan 2018
01 Jan 2018

Using Property Graphs for Rich Metadata Management in HPC Systems
Dong Dai ... Robert B Ross
-
Dong Dai, et. al.Dong Dai ... Robert B Ross
01 Nov 2014
01 Nov 2014

Challenges for Implementing FAIR Digital Objects with High Performance Workflows
Line Pouchard ... Bogdan Nicolae
Research Ideas and Outcomes | VOL. 8
Line Pouchard, et. al.Line Pouchard ... Bogdan Nicolae
12 Oct 2022
Research Ideas and Outcomes | VOL. 8

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

An asynchronous traversal engine for graph-based rich metadata management

Abstract

Talk to us

Similar Papers

More From: Parallel Computing