Abstract

Thanks to the promised improvements in performance and energy efficiency, hardware accelerators are taking momentum in many computing contexts, both in terms of variety and relative weight in the silicon area of many chips. Commonly, the way an application interacts with these hardware modules has many accelerator-specific traits and requires ad-hoc drivers that usually rely on potentially expensive system calls to manage accelerator resources and access orchestration. As a consequence, driver-based interfacing is far from uniform and can expose high latency, limiting the set of tasks suitable for acceleration. In this paper, we propose a uniform and low-latency interface based on Instruction Set Architecture (ISA) extension. All the previous studies that proposed extensions, were deeply tailored to address a single accelerator. One of the biggest disadvantages of those methods is their inability to scale. Adding more of these accelerators to one System-on-Chip (SoC) would result in ISA bloat, increasing power consumption and complexifying the decoding phase proportionally. Our proposed framework consists of a six-instruction ISA extension and the corresponding architectural support that implements the interface abstraction and the reservation logic at the hardware level. Our proposal allows controlling a broad class of integrated accelerators directly from the CPU. The proposed framework is ISA-independent, which means that it is applicable to all the existing ISAs. We implement it on the gem5 simulator by extending the RISC-V ISA. We evaluate it by simulating three compute-intensive accelerators and comparing our interfacing with a conventional driver-based one. The benchmarks highlight the performance benefits brought by our framework, with up to 10.38x speed up, as well as the ability to seamlessly support different accelerators with the same interface. The speed up advantage of our technique diminishes as the granularity of the workloads increases and the overhead for driver-based accelerators becomes less important. We also show that the impact of its hardware components on chip area and power consumption is limited.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.