A Resource Efficient Integer-Arithmetic-Only FPGA-Based CNN Accelerator for Real-Time Facial Emotion Recognition

Jaemyung Kim,Jin-Ku Kang,Yongwoo Kim

doi:10.1109/access.2021.3099075

Jaemyung Kim, Jin-Ku Kang + Show 1 more

Open Access

https://doi.org/10.1109/access.2021.3099075

Copy DOI

Journal: IEEE Access	Publication Date: Jan 1, 2021
Citations: 24	License type: CC BY 4.0

Affiliation: Inha University, Sangmyung University

Abstract

Recently, many researches have been conducted on recognition of facial emotion using convolutional neural networks (CNNs), which show excellent performance in computer vision. To obtain a high classification accuracy, a CNN architecture with many parameters and high computational complexity is required. However, this is not suitable for embedded systems where hardware resources are limited. In this paper, we present a lightweight CNN architecture optimized for embedded systems. The proposed CNN architecture has a small memory footprint and low computational complexity. Furthermore, a novel hardware-friendly quantization method that uses only integer-arithmetic is proposed. The proposed hardware-friendly quantization method maps the scale factors to power-of-two terms and replaces multiplication and division operations using scale factors with shift operations. To improve the generalization and classification performance of the CNN, we create the FERPlus-A dataset. This is a new training dataset created using a variety of image processing algorithms. After training with FERPlus-A, quantization has been performed. The size of a quantized CNN parameter is about 0.39 MB, and the number of operations is about 28 M integer operations (IOPs). By evaluating the performance of the quantized CNN that uses only integer-arithmetic on the FERPlus test dataset, the classification accuracy is approximately 86.58%. It achieved higher accuracy than other lightweight CNNs in prior studies. The proposed CNN architecture that uses only integer-arithmetic is implemented on the Xilinx ZC706 SoC platform for real-time facial emotion recognition by applying parallelism strategies and efficient data caching strategies. The FPGA-based CNN accelerator implemented for real-time facial emotion recognition achieves about 10 frame per second (FPS) at 250 MHz and consumes 2.3 W.

Highlights

Today, computers play a central role in industry and society, and are rapidly becoming a part of everyday life
In this paper, a new training dataset is presented that was created by combining various image processing algorithms to improve the performance of facial emotion recognition
The use of log level threshold quantization (LLTQ) was proposed. This is a novel hardware-friendly quantization method that uses only integerarithmetic by addressing the problems of existing methods

Summary

Introduction

Computers play a central role in industry and society, and are rapidly becoming a part of everyday life. The need for research on the interaction between humans and computers is increasing. For smooth interaction between them, a computer must be able to analyze human intention and respond . Emotions that appear in facial expressions are a universal and effective way of expressing human intentions. Analyzing human intentions through the emotions revealed in faces is called facial emotion recognition technology, and is used in diverse fields such as automobiles and robot industries. To understand accurately the emotions displayed on human faces, a computer must recognize the face and automatically classify the emotions according to specific emotion groups

Methods

Results

Conclusion