Abstract

Stateful data analytics framework have emerged to provide fresh and low-latency results for big data processing. At present, it is desired to achieve the fine-grained data model in mainstream data processing framework, e.g. Spark. However, Spark adopts coarse-grained data model in order to facilitate parallization, it makes the fine-grained data access in stateful data analytics very challenging. In this paper, we introduce a stateful component, Resilient State Table (RST) to Spark framework. To fill the gap between the coarse-grained data model in Spark and the fine-grained state access requirements in stateful data analytics, we devise the programming model of RST which interacts with Spark's coarse-grained memory representation seamlessly, and enables users to query/update the state entries in fine granularity with Spark-like programming interfaces. Performance evaluation in various application fields demonstrate that our proposed solution achieves the improvements in latency, fault-tolerance, as well as scalability.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call