An Energy Efficient EdgeAI Autoencoder Accelerator for Reinforcement Learning

Nitheesh Kumar Manjunath,Aidin Shiri,Morteza Hosseini,Tinoosh Mohsenin,Nicholas R Waytowich,Bharat Prakash

doi:10.1109/ojcas.2020.3043737

Abstract

In EdgeAI embedded devices that exploit reinforcement learning (RL), it is essential to reduce the number of actions taken by the agent in the real world and minimize the compute-intensive policies learning process. Convolutional autoencoders (AEs) has demonstrated great improvement for speeding up the policy learning time when attached to the RL agent, by compressing the high dimensional input data into a small latent representation for feeding the RL agent. Despite reducing the policy learning time, AE adds a significant computational and memory complexity to the model which contributes to the increase in the total computation and the model size. In this article, we propose a model for speeding up the policy learning process of RL agent with the use of AE neural networks, which engages binary and ternary precision to address the high complexity overhead without deteriorating the policy that an RL agent learns. Binary Neural Networks (BNNs) and Ternary Neural Networks (TNNs) compress weights into 1 and 2 bits representations, which result in significant compression of the model size and memory as well as simplifying multiply-accumulate (MAC) operations. We evaluate the performance of our model in three RL environments including DonkeyCar, Miniworld sidewalk, and Miniworld Object Pickup, which emulate various real-world applications with different levels of complexity. With proper hyperparameter optimization and architecture exploration, TNN models achieve near the same average reward, Peak Signal to Noise Ratio (PSNR) and Mean Squared Error (MSE) performance as the full-precision model while reducing the model size by 10x compared to full-precision and 3x compared to BNNs. However, in BNN models the average reward drops up to 12% - 25% compared to the full-precision even after increasing its model size by 4x. We designed and implemented a scalable hardware accelerator which is configurable in terms of the number of processing elements (PEs) and memory data width to achieve the best power, performance, and energy efficiency trade-off for EdgeAI embedded devices. The proposed hardware implemented on Artix-7 FPGA dissipates 250 μJ energy while meeting 30 frames per second (FPS) throughput requirements. The hardware is configurable to reach an efficiency of over 1 TOP/J on FPGA implementation. The proposed hardware accelerator is synthesized and placed-and-routed in 14 nm FinFET ASIC technology which brings down the power dissipation to 3.9 μJ and maximum throughput of 1,250 FPS. Compared to the state of the art TNN implementations on the same target platform, our hardware is 5x and 4.4x (2.2x if technology scaled) more energy efficient on FPGA and ASIC, respectively.

Highlights

R EINFORCEMENT Learning is a goal-oriented paradigm of machine learning in which an agent tries to learn a policy to complete complex tasks by trial and error
Once the optimum configuration established for each case through design optimization, the base configuration is quantized to the binary and ternary neural network to calculate the performance of the reinforcement learning (RL) agent as the basis of comparison of all cases
This process is replicated by ternarizing and binarizing the full-precision neural network of the autoencoder to measure the reward achieved by the RL agent

Summary

INTRODUCTION

R EINFORCEMENT Learning is a goal-oriented paradigm of machine learning in which an agent tries to learn a policy to complete complex tasks by trial and error. Despite achieving great success in unsupervised problems such as robotics and autonomous navigation [1], training the agent is a compute-intensive and time-consuming process because of the large amount of trial and error actions required to learn new policies. The new events can typically be new images visioned by the agent that are initially high dimensional data, whose complexity impact on the performance of the learning process. Reducing the complexity of both high dimensional event data (images) and the number of actions can facilitate the learning process of the agent and decrease the hardware complexity, leading to improved power dissipation, latency, and efficiency during the model deployment. The autoencoder models with compressed neural networks are assessed for their performance with the RL agent in three environments with varying complexities.

AND RELATED WORKS

LOW BIT WIDTH AUTOENCODER NETWORK

TERNARY NEURAL NETWORKS

ENVIRONMENT SETUP

CASE STUDY 1

CASE STUDY 2

EXPERIMENTAL RESULTS

CONCLUSION

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE Open Journal of Circuits and Systems	Publication Date: Jan 1, 2021
Citations: 39	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

An Energy Efficient EdgeAI Autoencoder Accelerator for Reinforcement Learning

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Open Journal of Circuits and Systems

Lead the way for us

Similar Papers

FPGA Implementation of Keyword Spotting System Using Depthwise Separable Binarized and Ternarized Neural Networks.
Seongwoo Bae ... Haechan Kim
Sensors (Basel, Switzerland) | VOL. 23
Seongwoo Bae, et. al.Seongwoo Bae ... Haechan Kim
19 Jun 2023
Sensors (Basel, Switzerland) | VOL. 23

Joint Modulation Format Identification and Optical Signal-to-Noise Ratio Monitoring Based on Ternary Neural Networks
Peng Zhou ... Dong Chen
IEEE Access | VOL. 10
Peng Zhou, et. al.Peng Zhou ... Dong Chen
01 Jan 2021
IEEE Access | VOL. 10

TAB: Unified and Optimized Ternary, Binary, and Mixed-precision Neural Network Inference on the Edge
Shien Zhu ... Luan H K Duong
ACM Transactions on Embedded Computing Systems | VOL. 21
Shien Zhu, et. al.Shien Zhu ... Luan H K Duong
30 Sep 2022
ACM Transactions on Embedded Computing Systems | VOL. 21

A Review of Binarized Neural Networks
Taylor Simons ... Dah-Jye Lee
Electronics | VOL. 8
Taylor Simons, et. al.Taylor Simons ... Dah-Jye Lee
12 Jun 2019
Electronics | VOL. 8

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

An Energy Efficient EdgeAI Autoencoder Accelerator for Reinforcement Learning

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Open Journal of Circuits and Systems