Efficient Acceleration of CNNs for Semantic Segmentation on FPGAs

Sebastian Vogel,Jannik Springer,Andre Guntoro,Gerd Ascheid

doi:10.1145/3289602.3294006

Abstract

We present a Vector Processing Engine (VPE) designed for the acceleration of Convolutional Neural Networks (CNNs) for semantic segmentation. Most CNN accelerators focus on classification. However, CNNs for semantic segmentation incorporate special layer types. Our accelerator supports not only regular convolutional layers, but also dilated convolutions and convolutions in combination with down- or up-sampling. These features are implemented in dedicated address generators which load the corresponding input vector from an input line buffer. The VPE is designed as a 64x64-array where up to 64 output features and 64 input features of a convolutional layer can be unrolled in parallel. The array has a peak performance of 4.12 TOp/s and achieves 3.85 TOp/s on a CNN for semantic segmentation - resulting in an average utilization of 93 %. The design is prototypically implemented on a Virtex UltraScale+ device with a clock rate of 250 MHz. In addition to the overall architecture, we present the two-hot quantization scheme. A value in two-hot quantization can be regarded as a combination of two power-of-two values. Hence, instead of bulky multipliers, two small bit-shifts are implemented. We design and implement dedicated arithmetic engines for this quantization scheme. Additionally, we evaluate this quantization scheme on the rather complex task of semantic segmentation. We show that the performance of an 8 bit two-hot quantization scheme is marginally lower in comparison to a regular 8 bit fixed-point variant.

Full Text