Novel Casestudy and Benchmarking of AlexNet for Edge AI: From CPU and GPU to FPGA

Firas Al-Ali,Sayan Kumar Ray,Farhad Mehdipour,Thilina Doremure Gamage,Hewa Wts Nanayakkara

doi:10.1109/ccece47787.2020.9255739

Abstract

Convolutional Neural Networks (CNNs) require massive parallelism due to the high-precision floating-point arithmetic operations they perform. So, demand of processing power in them is significantly higher than what a standard CPU can offer. This has traditionally made CNNs more suited for running on a Graphics Processing Unit (GPU). However, GPUs consume much more power than CPUs, rendering the former impractical for implementing CNNs in Edge AI (Artificial Intelligence), where restraining power consumption is paramount. On the other hand, FPGAs (Field Programmable Gate Arrays) are more suited for AI computing at the edge as they consume much lesser power than GPUs and even CPUs. Additionally, GPUs and CPUs are not suited for real-time AI applications, which require both high throughput and low latency at the same time and FPGAs excel at all these requirements. The purpose of this paper is to provide a study of the performance of FPGA as the most suitable platform for AI-based computing at the edge. To achieve this, we chose AlexNet, a popular CNN image classifier, for which we present a case study on four different platforms: CPU, GPU, embedded RISC core, and FPGA fabric. Then, we quantitatively measure and compare the performance in terms of inference time (time needed to classify an image) on all these platforms. Inference time using FPGA is reduced by almost 64, 1.6, and 1.1 times compared to a dual-core ARM, an i5-6400 CPU, and an Nvidia GPU, respectively.

Full Text