Abstract

GeaFlow is a distributed dataflow system optimized for streaming graph processing, and has been widely adopted at Ant Group, serving various scenarios ranging from risk control of financial activities to analytics on social networks and knowledge graphs. It is built on top of a base with full-fledged stateful stream processing capabilities, extended with a series of graph-aware optimizations to address the space explosion and programming complexity issues of conventional join-based approaches. We propose new state backends and streaming operators that facilitate processing on dynamic graph-structured datasets, reducing space consumed by states. We develop a hybrid domain-specific language that embeds Gremlin into SQL, supporting both table and graph abstractions over streaming data. In addition to streaming workloads, GeaFlow is also extensively used for some batch processing jobs. In the largest deployments to date, GeaFlow is able to process tens of millions of events per second and manage hundreds of terabytes of states.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call