Stream processing, which involves real-time computation of data as it is created or received, is vital for various applications, specifically wireless communication. The evolving protocols, the requirement for high-throughput, and the challenges of handling diverse processing patterns make it demanding. Traditional platforms grapple with meeting real-time throughput and latency requirements due to large data volume, sequential and indeterministic data arrival, and variable data rates, leading to inefficiencies in memory access and parallel processing. We present Canalis, a throughput-optimized framework designed to address these challenges, ensuring high-performance while achieving low energy consumption. Canalis is a hardware-software co-designed system. It includes a programmable spatial architecture, FluxSPU (Flux Stream Processing Unit), proposed by this work to enhance data throughput and energy efficiency. FluxSPU is accompanied by a software stack that eases the programming process. We evaluated Canalis with eight distinct benchmarks. When compared to CPU and GPU in mobile SoC to demonstrate the effectiveness of domain specialization, Canalis achieves an average speedup of 13.4× and 6.6×, and energy savings of 189.8× and 283.9×, respectively. In contrast to equivalent ASICs of the benchmarks, the average energy overhead of Canalis is within 2.4×, successfully maintaining generalizations without incurring significant overhead.
Read full abstract