Abstract

Temporal property graphs are graphs whose structure and properties change over time. Temporal graph datasets tend to be large due to stored historical information, asking for scalable analysis capabilities. We give a complete overview of Gradoop, a graph dataflow system for scalable, distributed analytics of temporal property graphs which has been continuously developed since 2005. Its graph model TPGM allows bitemporal modeling not only of vertices and edges but also of graph collections. A declarative analytical language called GrALa allows analysts to flexibly define analytical graph workflows by composing different operators that support temporal graph analysis. Built on a distributed dataflow system, large temporal graphs can be processed on a shared-nothing cluster. We present the system architecture of Gradoop, its data model TPGM with composable temporal graph operators, like snapshot, difference, pattern matching, graph grouping and several implementation details. We evaluate the performance and scalability of selected operators and a composed workflow for synthetic and real-world temporal graphs with up to 283 M vertices and 1.8 B edges, and a graph lifetime of about 8 years with up to 20 M new edges per year. We also reflect on lessons learned from the Gradoop effort.

Highlights

  • Graphs are simple, yet powerful data structures to model and analyze relations between real-world data objects

  • This is achieved with a new temporal property graph model Temporal Property Graph Model (TPGM) and powerful graph operators that can be used within analysis programs

  • First we show scalability results for a subset of individual TPGM operators, namely snapshot, difference, time-dependent grouping and temporal pattern matching

Read more

Summary

Introduction

Graphs are simple, yet powerful data structures to model and analyze relations between real-world data objects. By contrast, distributed graph processing systems support high scalability and parallel graph processing but typically lack an expressive graph data model and declarative query support [13,42] The latter makes it difficult for users to formulate complex analytical tasks as this requires profound programming and system knowledge. We present a complete system overview of Gradoop with a focus on the latest extensions for temporal property graphs This addition required adjustments in all components of the system, as well as the integration of analytical operators tailored to temporal graphs, for example, a new version of the pattern matching and grouping operators as well as support for temporal graph queries.

Graph system landscape
Temporal property graphs and query languages
Graph database and graph processing systems
Temporal graph processing systems
System architecture overview
Temporal property graph model
Graph data model
Subgraph
Snapshot
Difference
Time-dependent Graph Grouping
Temporal pattern matching
Graph transformation operators
Graph collection operators
Iterative graph algorithms
Implementation
Apache Flink
Graph representation
Programming abstractions
Graph Layouts
Graph operators
Graph storage
Evaluation
16 Linear
Lessons learned and ongoing work
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call