Hardware Acceleration for Postdecision State Reinforcement Learning in IoT Systems

Jianchi Sun,Nicholas Mastronarde,Yingjie Lao,Jacob Chakareski,Nikhilesh Sharma

doi:10.1109/jiot.2022.3163364

Abstract

Reinforcement learning (RL) is increasingly being used to optimize resource-constrained wireless Internet of Things (IoT) devices. However, existing RL algorithms that are lightweight enough to be implemented on these devices, such as <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$Q$ </tex-math></inline-formula> -learning, converge too slowly to effectively adapt to the experienced information source and channel dynamics, while deep RL algorithms are too complex to be implemented on these devices. By integrating basic models of the IoT system into the learning process, the so-called postdecision state (PDS)-based RL can achieve faster convergence speeds than these alternative approaches at lower complexity than deep RL; however, its complexity may still hinder the real-time and energy-efficient operations on IoT devices. In this article, we develop efficient hardware accelerators for PDS-based RL. We first develop an arithmetic hardware acceleration architecture and then propose a stochastic computing (SC)-based reconfigurable hardware architecture. By using simple bitwise computations enabled by SC, we eliminate costly multiplications involved in PDS learning, which simultaneously reduces the hardware area and power consumption. We show that the computational efficiency can be further improved by using extremely short stochastic representations without sacrificing learning performance. We demonstrate our proposed approach on a simulated wireless IoT sensor that must transmit delay-sensitive data over a fading channel while minimizing its energy consumption. Our experimental results show that our arithmetic accelerator is <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$5.3\times $ </tex-math></inline-formula> faster than <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$Q$ </tex-math></inline-formula> -learning and <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$2.6\times $ </tex-math></inline-formula> faster than a baseline hardware architecture, while the proposed SC-based architecture further reduces the critical path of the arithmetic accelerator by 87.9%.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE Internet of Things Journal	Publication Date: Jun 15, 2022
Citations: 3	License type: publisher-specific-oa

R Discovery Prime

R Discovery Prime

Hardware Acceleration for Postdecision State Reinforcement Learning in IoT Systems

Abstract

Talk to us

Similar Papers

More From: IEEE Internet of Things Journal

Lead the way for us

Similar Papers

Joint time scheduling and transaction fee selection in blockchain-based RF-powered backscatter cognitive radio network
Nguyen Cong Luong ... Dong In Kim
Computer Networks | VOL. 214
Nguyen Cong Luong, et. al.Nguyen Cong Luong ... Dong In Kim
01 Jul 2022
Computer Networks | VOL. 214

Intelligent User Association for Symbiotic Radio Networks Using Deep Reinforcement Learning
Qianqian Zhang ... Ying-Chang Liang
IEEE Transactions on Wireless Communications | VOL. 19
Qianqian Zhang, et. al.Qianqian Zhang ... Ying-Chang Liang
01 Jul 2020
IEEE Transactions on Wireless Communications | VOL. 19

Computation Offloading with Multiple Agents in Edge-Computing–Supported IoT
Shihao Shen ... Xiaofei Wang
ACM Transactions on Sensor Networks | VOL. 16
Shihao Shen, et. al.Shihao Shen ... Xiaofei Wang
19 Dec 2019
ACM Transactions on Sensor Networks | VOL. 16

Fault Analysis Techniques in Lightweight Ciphers for IoT Devices
Priyanka Joshi ... Bodhisatwa Mazumdar
-
Priyanka Joshi, et. al.Priyanka Joshi ... Bodhisatwa Mazumdar
01 Nov 2021
01 Nov 2021

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Hardware Acceleration for Postdecision State Reinforcement Learning in IoT Systems

Abstract

Talk to us

Similar Papers

More From: IEEE Internet of Things Journal