Accelerator-rich architectures

Jingsheng Jason Cong

doi:10.1145/2627369.2631636

Abstract

In order to drastically improve energy efficiency, we believe that future processor architectures will make extensive use of accelerators from single-chip implementation to datacenter-level integration, as custom-designed accelerators often provide 10-1000X performance/energy efficiency over the general-purpose processors [1]. Such an accelerator-rich architecture presents a fundamental departure from the classical von Neumann architecture, which emphasizes efficient sharing of the executions of different instructions on a common pipeline, providing an elegant solution when the computing resource is scarce. In constrast, the accelerator-rich architecture features heterogeneity and customizaiton for energy efficiency, which is better suited for energy-constrained design where the silicon resource is abundant. There are several concerns with the extensive usage of accelerators: (1) low utilization, (2) narrow workload coverage, (3) high design cost, and (4) unfamiliar programming interfaces. In this talk, I shall discuss recent progresses and ongoing work to address these concerns. Due to tight power and thermal budgets, only a fraction of computing elements on-chip can be active in future technologies (so called dark silicon [2]). This means low utilization (but much higher energy efficiency) will be an inherent characteristic of future chips. To address the problem of narrow workload coverage, we look to the use of composable accelerators and programmable fabrics to virtualize and accelerate larger blocks of computation [3]. The design cost can properly managed by leveraing the recent advances in high-level synthesis coupled with efficient parameterized architecture template generation. The programming interface is a critical issue for successful adaption of accelerator-rich architectures. It needs to support extensive use of accelerators from single-chip to datacenter scales [4]. We have made significant progress in compilation and runtime support to enable programmers to make use the existing programming interfaces (e.g. C/C++ for computation tasks and MapReduce or Hadoop for large-scale distributed computation in dataceners) for efficient use of accelerators at all scales.

Full Text