Abstract

Historically, hardware acceleration technologies have either been application-specific, therefore lacking in flexibility, or fully programmable, thereby suffering from notable inefficiencies on an application-by-application basis. To address the growing need for domain-specific acceleration technologies, this paper describes a design methodology (i) to automatically generate a domain-specific coarse-grained array from a set of representative applications and (ii) to introduce limited forms of architectural generality to increase the likelihood that additional applications can be successfully mapped onto it. In particular, coarse-grained arrays generated using our approach are intended to be integrated into customizable processors that use application-specific instruction set extensions to accelerate performance and reduce energy; rather than implementing these extensions using application-specific integrated circuit (ASIC) logic, which lacks flexibility, they can be synthesized onto our reconfigurable array instead, allowing the processor to be used for a variety of applications in related domains. Results show that our array is around 2× slower and 15× larger than an ultimately efficient ASIC implementation, and thus far more efficient than fieldprogrammable gate arrays (FPGAs), which are known to be 3-4× slower and 20-40× larger. Additionally, we estimate that our array is usually around 2× larger and 2× slower than an accelerator synthesized using traditional datapath merging, which has, if any, very limited flexibility beyond the design set of DFGs.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call