Abstract

Embedded CPUs typically use much less power than desktop or server CPUs but provide limited or no support for floating-point arithmetic. Hybrid reconfigurable CPUs combine fixed and reconfigurable computing fabrics to balance better execution performance and power consumption. We show how a Stretch S6 hybrid reconfigurable CPU (S6) can be extended to natively support double precision floating-point arithmetic. For lower precision number formats, multiple parallel arithmetic units can be implemented. We evaluate if the superlinear performance improvement of floating-point multiplication on reconfigurable fabrics can be exploited in the framework of a hybrid reconfigurable CPU. We provide an in-depth investigation of data paths to and from the S6 reconfigurable fabric and present peak and sustained throughput as a function of wide registers used and total operand size. We demonstrate the effect of the given interface when using a floating-point fused multiply-accumulate (FMA) SIMD unit to accelerate the LINPACK benchmark. We identify a mismatch between the size of the S6s reconfigurable fabric and the available interface bandwidth as the major bottleneck limiting performance which makes it a poor choice for scientific workloads relying on native support for floating-point arithmetic.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call