SCA: Search-Based Computing Hardware Architecture with Precision Scalable and Computation Reconfigurable Scheme.

Liang Chang,Xin Zhao,Jun Zhou

doi:10.3390/s22218545

Liang Chang, Xin Zhao + Show 1 more

Open Access

https://doi.org/10.3390/s22218545

Copy DOI

Abstract

Deep neural networks have been deployed in various hardware accelerators, such as graph process units (GPUs), field-program gate arrays (FPGAs), and application specific integrated circuit (ASIC) chips. Normally, a huge amount of computation is required in the inference process, creating significant logic resource overheads. In addition, frequent data accessions between off-chip memory and hardware accelerators create bottlenecks, leading to decline in hardware efficiency. Many solutions have been proposed to reduce hardware overhead and data movements. For example, specific lookup-table (LUT)-based hardware architecture can be used to mitigate computing operation demands. However, typical LUT-based accelerators are affected by computational precision limitation and poor scalability issues. In this paper, we propose a search-based computing scheme based on an LUT solution, which improves computation efficiency by replacing traditional multiplication with a search operation. In addition, the proposed scheme supports different precision multiple-bit widths to meet the needs of different DNN-based applications. We design a reconfigurable computing strategy, which can efficiently adapt to the convolution of different kernel sizes to improve hardware scalability. We implement a search-based architecture, namely SCA, which adopts an on-chip storage mechanism, thus greatly reducing interactions with off-chip memory and alleviating bandwidth pressure. Based on experimental evaluation, the proposed SCA architecture can achieve 92%, 96% and 98% computational utilization for computational precision of 4 bit, 8 bit and 16 bit, respectively. Compared with state-of-the-art LUT-based architecture, the efficiency can be improved four-fold.

Full Text