An Efficient Hardware Implementation of Reinforcement Learning: The Q-Learning Algorithm

Sergio Spano,Marco Matta,Marco Re,Luca Di Nunzio,Gian Carlo Cardarilli,Daniele Giardino,Alberto Nannarelli,Rocco Fazzolari

doi:10.1109/access.2019.2961174

Sergio Spano, Marco Matta + Show 6 more

Open Access

https://doi.org/10.1109/access.2019.2961174

Copy DOI

Abstract

In this paper we propose an efficient hardware architecture that implements the Q-Learning algorithm, suitable for real-time applications. Its main features are low-power, high throughput and limited hardware resources. We also propose a technique based on approximated multipliers to reduce the hardware complexity of the algorithm. We implemented the design on a Xilinx Zynq Ultrascale+ MPSoC ZCU106 Evaluation Kit. The implementation results are evaluated in terms of hardware resources, throughput and power consumption. The architecture is compared to the state of the art of Q-Learning hardware accelerators presented in the literature obtaining better results in speed, power and hardware resources. Experiments using different sizes for the Q-Matrix and different wordlengths for the fixed point arithmetic are presented. With a Q-Matrix of size $8\times4$ (8 bit data) we achieved a throughput of 222 MSPS (Mega Samples Per Second) and a dynamic power consumption of 37 mW, while with a Q-Matrix of size $256\times16$ (32 bit data) we achieved a throughput of 93 MSPS and a power consumption 611 mW. Due to the small amount of hardware resources required by the accelerator, our system is suitable for multi-agent IoT applications. Moreover, the architecture can be used to implement the SARSA (State-Action-Reward-State-Action) Reinforcement Learning algorithm with minor modifications.

Highlights

Reinforcement Learning (RL) is a Machine Learning (ML) approach used to train an entity, called agent, to accomplish a certain task [1]
Software-based implementations performance is the main limitation in further development of such systems and the use of hardware accelerators based on FPGAs or ASICs can represent an efficient solution for implementing RL algorithms
In 2017, Su et al [24] proposed another Deep Q-Learning hardware implementation based on an Intel Arria-10 FPGA

Summary

INTRODUCTION

Reinforcement Learning (RL) is a Machine Learning (ML) approach used to train an entity, called agent, to accomplish a certain task [1]. The reward (or reinforcement) is a quality figure for the last action performed by the agent and it is represented as a positive or negative number Through this iterative process, the agent learns an optimal actionselection policy to accomplish its task. This kind of applications require powerful computing platforms able to process very large amount of data as fast as possible and with limited power consumption For these reasons, software-based implementations performance is the main limitation in further development of such systems and the use of hardware accelerators based on FPGAs or ASICs can represent an efficient solution for implementing RL algorithms. The size of this matrix is N × Z where N is the number of the possible agent’s states to sense the environment and Z is the number of possible actions that the agent can perform This means that Q-Learning operates in a discrete stateaction space S × A. In [16] it is proved that the knowledge of the Q-Matrix suffices to extract the optimal action-selection policy for a RL agent

RELATED WORK

PROPOSED ARCHITECTURE

MAX BLOCK

CONCLUSION

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE Access	Publication Date: Jan 1, 2019
Citations: 91	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

An Efficient Hardware Implementation of Reinforcement Learning: The Q-Learning Algorithm

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access

Lead the way for us

Similar Papers

A Hardware Implementation for Deep Reinforcement Learning Machine
Pham Cong Thinh ... Lam Duc Khai
-
Pham Cong Thinh, et. al.Pham Cong Thinh ... Lam Duc Khai
20 Dec 2022
20 Dec 2022

Oil Spill Identification from SAR Images for Low Power Embedded Systems Using CNN
Lorenzo Diana ... Jia Xu
Remote Sensing | VOL. 13
Lorenzo Diana, et. al.Lorenzo Diana ... Jia Xu
10 Sep 2021
Remote Sensing | VOL. 13

Інформаційна технологія і програмне забезпечення для імітаційного моделювання, синтезу і досліджень методів криптографічного захисту даних
Heorhii Vorobets ... Volodymyr Rusyn
Security of Infocommunication Systems and Internet of Things | VOL. 1
Heorhii Vorobets, et. al.Heorhii Vorobets ... Volodymyr Rusyn
30 Dec 2023
Security of Infocommunication Systems and Internet of Things | VOL. 1

Process-in-Memory realized by nonvolatile Task-Scheduling and Resource-Sharing XNOR-Net hardware Accelerator architectures
Milad Tanavardi Nasab ... Kian Jafari
AEUE - International Journal of Electronics and Communications | VOL. 178
Milad Tanavardi Nasab, et. al.Milad Tanavardi Nasab ... Kian Jafari
11 Apr 2024
AEUE - International Journal of Electronics and Communications | VOL. 178

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

An Efficient Hardware Implementation of Reinforcement Learning: The Q-Learning Algorithm

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access