Overcoming resource underutilization in spatial CNN accelerators

Yongming Shen,Michael Ferdman,Peter Milder

doi:10.1109/fpl.2016.7577315

Abstract

Convolutional neural networks (CNNs) are revolutionizing a variety of machine learning tasks, but they present significant computational challenges. Recently, FPGA-based accelerators have been proposed to improve the speed and efficiency of CNNs. Current approaches construct an accelerator optimized to maximize the overall throughput of iteratively computing the CNN layers. However, this approach leads to dynamic resource underutilization because the same accelerator is used to compute CNN layers of radically varying dimensions. We present a new CNN accelerator design that improves the dynamic resource utilization. Using the same FPGA resources, we build multiple accelerators, each specialized for specific CNN layers. Our design achieves 1.3× higher throughput than the state of the art when evaluating the convolutional layers of the popular AlexNet CNN on a Xilinx Virtex-7 FPGA.

Full Text