Abstract

Coupling processors with acceleration hardware is an effective manner to improve energy efficiency of embedded systems. Many-core is nowadays a dominating design paradigm for SoCs, which opens new challenges and opportunities for designing HW blocks. Exploring acceleration solutions that naturally fit into well-established parallel programming models and that can be incrementally added on top of existing parallel applications is thus extremely important. In this paper we focus on tightly-coupled multi-core cluster architectures, representative of the basic building block of the most recent many-cores, and we enhance it with dedicated HW processing units (HWPU). We propose an architecture where the HWPUs share the same L1 data memory through which processors also communicate, implementing a <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> zero-copy</i> communication model. High-level synthesis (HLS) tools are used to generate HW blocks, then a custom wrapper interfaces the latter to the tightly coupled cluster. We validate our proposal on RTL models, running both synthetic workload and real applications. Experimental results demonstrate that on average our solution provides nearly identical performance to traditional private-memory coarse-grained accelerators, but it achieves up to 32 percent better performance/area/watt and it requires only minimal modifications to legacy parallel codes.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call