On-chip trainable hardware-based deep Q-networks approximating a backpropagation algorithm

Sung Yun Woo,Jangsaeng Kim,Won-Mook Kang,Jong-Ho Bae,Jong-Ho Lee,Seongbin Oh,Dongseok Kwon,Chul-Heung Kim,Soochang Lee,Byung-Gook Park

doi:10.1007/s00521-021-05699-z

Sung Yun Woo, Jangsaeng Kim + Show 8 more

Open Access

https://doi.org/10.1007/s00521-021-05699-z

Copy DOI

Abstract

Reinforcement learning (RL) using deep Q-networks (DQNs) has shown performance beyond the human level in a number of complex problems. In addition, many studies have focused on bio-inspired hardware-based spiking neural networks (SNNs) given the capabilities of these technologies to realize both parallel operation and low power consumption. Here, we propose an on-chip training method for DQNs applicable to hardware-based SNNs. Because the conventional backpropagation (BP) algorithm is approximated, a performance evaluation based on two simple games shows that the proposed system achieves performance similar to that of a software-based system. The proposed training method can minimize memory usage and reduce power consumption and area occupation levels. In particular, for simple problems, the memory dependency can be significantly reduced given that high performance is achieved without using replay memory. Furthermore, we investigate the effect of the nonlinearity characteristics and two types of variation of non-ideal synaptic devices on the performance outcomes. In this work, thin-film transistor (TFT)-type flash memory cells are used as synaptic devices. A simulation is also conducted using fully connected neural network with non-leaky integrated-and-fire (I&F) neurons. The proposed system shows strong immunity to device variations because an on-chip training scheme is adopted.

Highlights

Neuromorphic computing inspired by the human brain has emerged as one of the most promising types of computing architectures
We focused on the on-chip training method using the BP algorithm in a hardware-based neural network. This is advantageous given its relatively high performance compared to the spike-timing-dependent plasticity (STDP) learning rule, low power consumption, high-speed training capabilities, and strong immunity to variations of non-ideal synaptic devices compared to the off-chip training method
In the opposite case, when the error spike is negative, a program pulse is applied to the synaptic device by overlapping the error spike applied to the source line and the positive part of the spike applied to the gate of the synaptic device, which depressed the synaptic weight

Summary

Introduction

Neuromorphic computing inspired by the human brain has emerged as one of the most promising types of computing architectures. On-chip training is immune to variations of non-ideal synaptic devices [10, 19,20,21,22] This approach is advantageous given its low power consumption and high-speed training capabilities, as both weighted sum and weight updating occur in the hardware-based neural network [5, 23, 24]. We focused on the on-chip training method using the BP algorithm in a hardware-based neural network This is advantageous given its relatively high performance compared to the STDP learning rule, low power consumption, high-speed training capabilities, and strong immunity to variations of non-ideal synaptic devices compared to the off-chip training method.

Synaptic device

Training method

N lÀ1 i

Hardware-based deep Q-network

Results and discussion

Rush hour game

Network without replay memory

Conclusion

Compliance with ethical standards