Exploring the Tradeoffs of Application-Specific Processing

Joshua C Schabel,Paul D Franzon

doi:10.1109/jetcas.2018.2849939

Abstract

Non-traditional processing schemes continue to grow in popularity as a means to achieve high performance with greater energy-efficiency. Data-centric processing is one such scheme that targets functional-specialization and memory bandwidth limitations, opening up small processors to wide memory IO. These functional-specific accelerators prove to be an essential component to achieve energy-efficiency and performance, but purely application-specific integrated circuit accelerators have expensive design overheads with limited reusability. We propose an architecture that combines existing processing schemes utilizing CGRAs for dynamic data path configuration as a means to add flexibility and reusability to data-centric acceleration. While flexibility adds a large energy overhead, performance can be regained through intelligent mappings to the CGRA for the functions of interest, while reusability can be gained through incrementally adding general purpose functionality to the processing elements. Building upon previous work accelerating sparse encoded neural networks, we present a CGRA architecture for mapping functional accelerators operating at 500 MHz in 32 nm. This architecture achieves a latency-per-function within $2{\times}$ of its function-specific counterparts with energy-per-operation increases between 21–188 $\times$ , and energy-per-area increases between 1.8–3.6 $\times$ .

Full Text