Abstract

Low-latency stream data processing is a key enabler for on-line data analysis applications, such as detecting anomaly conditions and change points from stream data continuously generated from sensors and networking services. Existing stream processing frameworks are classified into micro-batch and one-at-a-time processing methodology. Apache Spark Streaming employs the micro-batch methodology, where data analysis is repeatedly performed for a series of data arrived during a short time period, called a micro batch. A rich set of data analysis libraries provided by Spark, such as machine learning and graph processing, can be applied for the micro batches. However, a drawback of the micro-batch processing methodology is a high latency for detecting anomaly conditions and change points. This is because data are accumulated in a micro batch (e.g., 1 sec length) and then data analysis is performed for the micro batch. In this paper, we propose to offload one-at-a-time methodology analysis functions on an FPGA-based 10Gbit Ethernet network interface card (FPGA NIC) in cooperation with Spark Streaming framework, in order to significantly reduce the processing latency and improve the processing throughput. We implemented word count and change-point detection applications on Spark Streaming with our FPGA NIC, where a one-at-a-time methodology analysis logic is implemented. Experiment results demonstrates that the word count throughput is improved by 22x and the change-point detection latency is reduced by 94.12% compared to the original Spark Streaming. Our approach can complement the existing micro-batch methodology data analysis framework with ultra low latency one-at-a-time methodology logic.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call