Accelerating Convolutional Neural Network With FFT on Embedded Hardware

Tahmid Abtahi,Colin Shea,Amey Kulkarni,Tinoosh Mohsenin

doi:10.1109/tvlsi.2018.2825145

Abstract

Fueled by ImageNet Large Scale Visual Recognition Challenge and Common Objects in Context competitions, the convolutional neural network (CNN) has become important in computer vision and natural language processing. However, state-of-the-art CNNs are computationally memory-intensive, thus energy-efficient implementation on the embedded platform is challenging. Recently, VGGNet and ResNet showed that deep neural networks with more convolution layers and a few fully connected layers can achieve lower error rates, thus reducing the complexity of convolution layers is of utmost importance. In this paper, we evaluate three variations of convolutions, including direct convolution (Direct-Conv), fast Fourier transform (FFT)-based convolution (FFT-Conv), and FFT overlap and add convolution (FFT-OVA-Conv) in terms of computation complexity and memory storage requirements for popular CNN networks in embedded hardware. We implemented these three techniques for ResNet-20 with the CIFAR-10 data set on a low-power domain-specific many-core architecture called power-efficient nanoclusters (PENCs), NVIDIA Jetson TX1 graphics processing unit (GPU), ARM Cortex A53 CPU, and SPARse Convolutional NETwork (SPARCNet) accelerator on Zynq 7020 FPGA to explore the tradeoff between software and hardware implementation, domain-specific logic and instructions, as well as various parallelism across different architectures. Results are evaluated and compared with respect to throughput per layer, energy consumption, and execution time for the three methods. SPARCNet deployed on Zynq FPGA achieved 42-ms runtime with 135-mJ energy consumption with a 10.8-MB/s throughput per layer using FFT-Conv for ResNet-20. Using built-in FFT instruction in PENC, the FFT-OVA-Conv performs $2.9\times $ and $1.65\times $ faster and achieves $6.8\times $ and $2.5\times $ higher throughput per watt than Direct-Conv and FFT-Conv. In ARM A53 CPU, FFT-OVA-Conv achieves $3.36\times $ and $1.38\times $ improvement in execution time and $2.72\times $ and $1.32\times $ higher throughput than Direct-Conv and FFT-Conv. In TX1 GPU, FFT-Conv is $1.9\times $ faster, $2.2\times $ more energy-efficient, and achieves $5.6\times $ higher throughput per layer than Direct-Conv. PENC is 10 $916\times $ and $1.8\times $ faster and $5053\times $ and $4.3\times $ more energy-efficient and achieves $7.5\times $ and $1.2\times $ higher throughput per layer than ARM A53 CPU and TX1 GPU, respectively.

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE Transactions on Very Large Scale Integration (VLSI) Systems	Publication Date: Sep 1, 2018
Citations: 108	License type: publisher-specific-oa

R Discovery Prime

R Discovery Prime

Accelerating Convolutional Neural Network With FFT on Embedded Hardware

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Very Large Scale Integration (VLSI) Systems

Lead the way for us

Similar Papers

Accelerating convolutional neural network with FFT on tiny cores
Tahmid Abtahi ... Tinoosh Mohsenin
-
Tahmid Abtahi, et. al.Tahmid Abtahi ... Tinoosh Mohsenin
01 May 2017
01 May 2017

Modified Convolution Neural Network for Highly Effective Parallel Processing
Sang-Soo Park ... Ki-Seok Chung
-
Sang-Soo Park, et. al.Sang-Soo Park ... Ki-Seok Chung
01 Aug 2017
01 Aug 2017

A time-efficient convolutional neural network model in human activity recognition
Marjan Gholamrezaii ... Smt Almodarresi
Multimedia Tools and Applications | VOL. 80
Marjan Gholamrezaii, et. al.Marjan Gholamrezaii ... Smt Almodarresi
26 Feb 2021
Multimedia Tools and Applications | VOL. 80

Scenario Based Run-Time Switching for Adaptive CNN-Based Applications at the Edge
Svetlana Minakova ... Andy D Pimentel
ACM Transactions on Embedded Computing Systems | VOL. 21
Svetlana Minakova, et. al.Svetlana Minakova ... Andy D Pimentel
08 Feb 2022
ACM Transactions on Embedded Computing Systems | VOL. 21

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Accelerating Convolutional Neural Network With FFT on Embedded Hardware

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Very Large Scale Integration (VLSI) Systems