Abstract

Stream join is widely used to extract key information between multi-source stream data and is an important supporting technology for big data processing. Join is easy to become a performance bottleneck because of the large-scale join predicate calculation when joining two big data streams. To improve performance, stream join systems often adopt parallel or distributed expansion methods. However, the multi-core parallel stream join system cannot cope with large-scale data streams because scalability is limited by the number of CPU cores. And the distributed extended stream join system introduces the overhead of distributed framework, resulting in a serious drop in hardware processing efficiency. To achieve efficient and large-scale expansion, this paper proposes a stream join system FJoin that uses the FPGA accelerator to scale up. FJoin can do High-Parallel Flow Join, in which data of the join window can flow through once to complete all join calculations after loading multiple stream tuples. For join predicates whose logic is easy to implement in FPGA, a large number of basic join units are connected in series to form a deep join pipeline to achieve large-scale parallelism. The host CPU and FPGA device coordinate control, divide the continuous stream join calculation into independent small-batch tasks and efficiently ensure completeness of parallel stream join. FJoin is implemented on a platform equipped with an FPGA accelerator card. The test results based on large-scale real data sets show that FJoin can increase the join calculation speed by 16 times using a single FPGA accelerator card and reach 5 times system throughput compared with the current best stream join system deployed on a 40-node cluster, and latency meets the real-time stream processing requirements.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call