Dissecting Convolutional Neural Networks for Efficient Implementation on Constrained Platforms

Vishalini R Laguduva,Srinivas Katkoori,Sathyanarayanan N Aakur,Shakil Mahmud,Robert Karam

doi:10.1109/vlsid49098.2020.00043

Vishalini R Laguduva, Srinivas Katkoori + Show 3 more

https://doi.org/10.1109/vlsid49098.2020.00043

Copy DOI

Abstract

Deep neural networks, especially Convolutional Neural Networks (CNNs), have driven unprecedented improvements in machine learning tasks in computer vision research. Most research focuses on improving inference accuracy and designing networks that are capable of reaching a target accuracy for a specific application. Networks are typically implemented on relatively unconstrained devices, such as a workstation Graphics Processing Units (GPUs), so the design and optimization techniques do not scale well to low power and resource-constrained applications like mobile, automotive, or Internet of Things (IoT). While efforts have shifted for various acceleration frameworks on constrained platforms such as Field Programmable Gate Arrays (FPGAs) and System-on-Chip (SoC) devices, obtaining the energy-efficient implementations without a significant drop in accuracy remains an art than science. In this paper, we aim to breakdown CNNs and analyze each layer independently and quantify their impact on memory footprint, power consumption, and latency. We analyze these properties of four common Convolutional Neural Network (CNN) architectures: LeNet, AlexNet, VGG-11, and VGG-16. We optimize and implement each network on a low power computing platform suitable for constrained environments. Networks are validated using the MNIST and CIFAR-10 datasets. We show that modifying specific network design parameters such as filter size, the number of fully connected layers, and subsampling techniques have a considerable impact on the overall performance and efficiency, enabling informed trade-offs and optimization. We demonstrate up to 11x reduction in power consumption compared to another FPGA platform with 96x more memory capacity, while maintaining the state-of-art classification accuracy.

Full Text