FTA-GAN: A Computation-Efficient Accelerator for GANs With Fast Transformation Algorithm.

Wendong Mao,Zhongfeng Wang,Peixiang Yang

doi:10.1109/tnnls.2021.3110728

Abstract

Nowadays, generative adversarial network (GAN) is making continuous breakthroughs in many machine learning tasks. The popular GANs usually involve computation-intensive deconvolution operations, leading to limited real-time applications. Prior works have brought several accelerators for deconvolution, but all of them suffer from severe problems, such as computation imbalance and large memory requirements. In this article, we first introduce a novel fast transformation algorithm (FTA) for deconvolution computation, which well solves the computation imbalance problem and removes the extra memory requirement for overlapped partial sums. Besides, it can reduce the computation complexity for various types of deconvolutions significantly. Based on FTA, we develop a fast computing core (FCC) and the corresponding computing array so that the deconvolution can be efficiently computed. We next optimize the dataflow and storage scheme to further reuse on-chip memory and improve the computation efficiency. Finally, we present a computation-efficient hardware architecture for GANs and validate it on several GAN benchmarks, such as deep convolutional GAN (DCGAN), energy-based GAN (EBGAN), and Wasserstein GAN (WGAN). The experimental results show that our design can reach 2211 GOPS under 185-MHz working frequency on Intel Stratix 10SX field-programmable gate array (FPGA) board with satisfactory visual results. In brief, the proposed design can achieve more than 2× hardware efficiency improvement over previous designs, and it can reduce the storage requirement drastically.

Full Text