Abstract
Temporal property graphs are graphs whose structure and properties change over time. Temporal graph datasets tend to be large due to stored historical information, asking for scalable analysis capabilities. We give a complete overview of Gradoop, a graph dataflow system for scalable, distributed analytics of temporal property graphs which has been continuously developed since 2005. Its graph model TPGM allows bitemporal modeling not only of vertices and edges but also of graph collections. A declarative analytical language called GrALa allows analysts to flexibly define analytical graph workflows by composing different operators that support temporal graph analysis. Built on a distributed dataflow system, large temporal graphs can be processed on a shared-nothing cluster. We present the system architecture of Gradoop, its data model TPGM with composable temporal graph operators, like snapshot, difference, pattern matching, graph grouping and several implementation details. We evaluate the performance and scalability of selected operators and a composed workflow for synthetic and real-world temporal graphs with up to 283 M vertices and 1.8 B edges, and a graph lifetime of about 8 years with up to 20 M new edges per year. We also reflect on lessons learned from the Gradoop effort.
Highlights
Graphs are simple, yet powerful data structures to model and analyze relations between real-world data objects
This is achieved with a new temporal property graph model Temporal Property Graph Model (TPGM) and powerful graph operators that can be used within analysis programs
First we show scalability results for a subset of individual TPGM operators, namely snapshot, difference, time-dependent grouping and temporal pattern matching
Summary
Graphs are simple, yet powerful data structures to model and analyze relations between real-world data objects. By contrast, distributed graph processing systems support high scalability and parallel graph processing but typically lack an expressive graph data model and declarative query support [13,42] The latter makes it difficult for users to formulate complex analytical tasks as this requires profound programming and system knowledge. We present a complete system overview of Gradoop with a focus on the latest extensions for temporal property graphs This addition required adjustments in all components of the system, as well as the integration of analytical operators tailored to temporal graphs, for example, a new version of the pattern matching and grouping operators as well as support for temporal graph queries.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have