Uni-OPU: An FPGA-Based Uniform Accelerator for Convolutional and Transposed Convolutional Networks

Yunxuan Yu,Lei He,Tiandong Zhao,Mingyu Wang,Kun Wang

doi:10.1109/tvlsi.2020.2995741

Abstract

In this article, we design the first full software/ hardware stack, called Uni-OPU , for an efficient uniform hardware acceleration of different types of transposed convolutional (TCONV) networks and conventional convolutional (CONV) networks. Specifically, a software compiler is provided to transform the computation of various TCONV, i.e., zero-inserting-based TCONV (zero-TCONV), nearest-neighbor resizing-based TCONV (NN-TCONV), and CONV layers into the same pattern. The compiler conducts the following optimizations: 1) eliminating up to 98.4% of operations in TCONV by making use of the fixed pattern of TCONV upsampling; 2) decomposing and reformulating TCONV and CONV into streaming parallel vector multiplication with a uniform address generation scheme and data flow pattern; and 3) efficient scheduling and instruction compilation to map networks onto a hardware processor. An instruction-based hardware acceleration processor is developed to efficiently speedup our uniform computation pattern with throughput up to 2.35 TOPS for the TCONV layer, consuming only 2.89 W dynamic power. We evaluate Uni-OPU on a benchmark set composed of six TCONV networks from different application fields. Extensive experimental results indicate that Uni-OPU is able to gain $1.45 \times $ to $3.68 \times $ superior power efficiency compared with state-of-the-art zero-TCONV accelerators. High acceleration performance is also achieved on NN-TCONV networks, the acceleration of which have not been explored before. In summary, we observe $1.90 \times $ and $1.63 \times $ latency reduction, as well as $15.04 \times $ and $12.43 \times $ higher power efficiency on zero-TCONV and NN-TCONV networks compared with Titan Xp GPU on average. To the best of our knowledge, ours is the first in-depth study to completely unify the computation process of zero-TCONV, NN-TCONV, and CONV layers.

Full Text