Efficient Fine-grained Processor-logic Interactions on the Cache-coherent Zynq Platform

Alexander Kroh,Oliver Diessel

doi:10.1145/3277506

Abstract

The introduction of cache-coherent processor-logic interconnects in CPU-FPGA platforms promises low-latency communication between CPU and FPGA fabrics. This reduced latency improves the performance of heterogeneous systems implemented on such devices and gives rise to new software architectures that can better use the available hardware. Via an extended study accelerating the software task scheduler of a microkernel operating system, this article reports on the potential for accelerating applications that exhibit fine-grained interactions. In doing so, we evaluate the performance of direct and cache-coherent communication methods for applications that involve frequent, low-bandwidth transactions between CPU and programmable logic. In the specific case we studied, we found that replacing a highly optimised software implementation of the task scheduler with an FPGA-based scheduler reduces the cost of communication between two software threads by 5.5%. We also found that, while hardware acceleration reduces cache footprint, we still observe execution time variability because of other non-deterministic features of the CPU.

Full Text