Abstract

Reward-modulated spike-timing-dependent plasticity (R-STDP) is a brain-inspired reinforcement learning (RL) rule, exhibiting potential for decision-making tasks and artificial general intelligence. However, the hardware implementation of the reward-modulation process in R-STDP usually requires complicated Si complementary metal-oxide-semiconductor (CMOS) circuit design that causes high power consumption and large footprint. Here, a design with two synaptic transistors (2T) connected in a parallel structure is experimentally demonstrated. The 2T unit based on WSe2 ferroelectric transistors exhibits reconfigurable polarity behavior, where one channel can be tuned as n-type and the other as p-type due to nonvolatile ferroelectric polarization. In this way, opposite synaptic weight update behaviors with multilevel (>6 bit) conductance states, ultralow nonlinearity (0.56/-1.23), and large Gmax /Gmin ratio of 30 are realized. By applying positive/negative reward to (anti-)STDP component of 2T cell, R-STDP learning rules are realized for training the spiking neural network and demonstrated to solve the classical cart-pole problem, exhibiting a way for realizing low-power (32 pJ per forward process) and highly area-efficient (100 µm2 ) hardware chip for reinforcement learning.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.