FPGA Implementation of 3-bit Quantized CNN for Semantic Segmentation

Masayuki Miyama

doi:10.1088/1742-6596/1729/1/012004

Abstract

Semantic segmentation is a task of inputting an image and performing category classification for each pixel. Semantic segmentation by CNN has high accuracy but its calculation using floating-point numbers consumes a large amount of power. We adopted UNET as the semantic segmentation CNN and improved it for FPGA implementation. We quantized both weights and activations of the network up to 3-bit. Then, we devised a dedicated hardware architecture for the quantized CNN and implemented it on an FPGA. This circuit uses only internal memory to perform forward propagation calculations, that eliminates high-power external memory accesses. This circuit is a stall-free pixel-by-pixel pipeline, and performs 8 rows, 16 input channels, 16 output channels, 3 by 3 pixels convolution calculations in parallel. The convolution calculation performance at an operating frequency of 300 MHz is 11 TOPs/s.

Full Text