Extensible Embedded Processor for Convolutional Neural Networks

Joshua Misko,Shrikant S Jadhav,Youngsoo Kim

doi:10.1155/2021/6630552

Abstract

Convolutional neural networks (CNNs) require significant computing power during inference. Smart phones, for example, may not run a facial recognition system or search algorithm smoothly due to the lack of resources and supporting hardware. Methods for reducing memory size and increasing execution speed have been explored, but choosing effective techniques for an application requires extensive knowledge of the network architecture. This paper proposes a general approach to preparing a compressed deep neural network processor for inference with minimal additions to existing microprocessor hardware. To show the benefits to the proposed approach, an example CNN for synthetic aperture radar target classification is modified and complimentary custom processor instructions are designed. The modified CNN is examined to show the effects of the modifications and the custom processor instructions are profiled to illustrate the potential performance increase from the new extended instructions.

Highlights

Convolutional neural networks (CNNs) have become increasingly popular for image classification and a variety of other machine learning tasks
They are more efficient than other classifier types that can be trained with large datasets, CNNs are still computationally intensive applications
New processor instructions to calculate these layers efficiently alongside a Single Instruction Multiple Data (SIMD) Multiply and Accumulate (MAC) instruction for fully connected layers and 1 × 1 convolution can cover all basic layers of a modern CNN and enable fast, low-power inference

Summary

Introduction

Convolutional neural networks (CNNs) have become increasingly popular for image classification and a variety of other machine learning tasks. In addition to computation requirements, memory access penalties significantly impact overall execution time and power consumption. Converting to smaller bit-width representations of weights and data in the middle layers of a CNN drastically reduces the number of memory accesses and increases execution speedup in real systems. For the same chip area, multiple small fixed-point multipliers increase the computational throughput for convolution and fully connected layers. GPUs are the preferred method of training and running CNNs in research because they hide memory access penalties by compensating for image throughput. New processor instructions to calculate these layers efficiently alongside a SIMD MAC instruction for fully connected layers and 1 × 1 convolution can cover all basic layers of a modern CNN and enable fast, low-power inference. Computational speed increase and gate counts of the custom instructions provide a basis for the viability of the proposed study in a real application-specific instruction set processor (ASIP) application

Convolutional Neural Network Architecture

Custom Instruction Implementation Details

Experimental Results

Related Work

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Scientific Programming	Publication Date: Apr 21, 2021
Citations: 3	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Extensible Embedded Processor for Convolutional Neural Networks

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Scientific Programming

Lead the way for us

Similar Papers

SAR Target Classification Based on Sample Spectral Regularization
Wei Liang ... Tengfei Zhang
Remote Sensing | VOL. 12
Wei Liang, et. al.Wei Liang ... Tengfei Zhang
04 Nov 2020
Remote Sensing | VOL. 12

Multi-Class Dual- Stream Convolutional Neural Network for Synthetic Aperture Radar Automatic Target Recognition of Ground Military Vehicle
Olalekan J Awujoola ... Ae Evwiekpaefe
Journal of Scientific Research | VOL. 66
Olalekan J Awujoola, et. al.Olalekan J Awujoola ... Ae Evwiekpaefe
01 Jan 2021
Journal of Scientific Research | VOL. 66

Transfer Learning with Deep Convolutional Neural Network for SAR Target Classification with Limited Labeled Data
Zhongling Huang ... Bin Lei
Remote Sensing | VOL. 9
Zhongling Huang, et. al.Zhongling Huang ... Bin Lei
31 Aug 2017
Remote Sensing | VOL. 9

SAR Target Classification Based on Multiscale Attention Super-Class Network
Di Wang ... Daoxiang An
IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing | VOL. 15
Di Wang, et. al.Di Wang ... Daoxiang An
01 Jan 2021
IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing | VOL. 15

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Extensible Embedded Processor for Convolutional Neural Networks

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Scientific Programming