Abstract

Reinforcement learning has been intensively investigated and developed in artificial intelligence in the absence of training data, such as autonomous driving vehicles, robot control, internet advertising, and elastic optical networks. However, the computational cost of reinforcement learning with deep neural networks is extremely high and reducing the learning cost is a challenging issue. We propose a photonic on-line implementation of reinforcement learning using optoelectronic delay-based reservoir computing, both experimentally and numerically. In the proposed scheme, we accelerate reinforcement learning at a rate of several megahertz because there is no required learning process for the internal connection weights in reservoir computing. We perform two benchmark tasks, CartPole-v0 and MountanCar-v0 tasks, to evaluate the proposed scheme. Our results represent the first hardware implementation of reinforcement learning based on photonic reservoir computing and pave the way for fast and efficient reinforcement learning as a novel photonic accelerator.

Highlights

  • Reinforcement learning has been intensively investigated and developed in artificial intelligence in the absence of training data, such as autonomous driving vehicles, robot control, internet advertising, and elastic optical networks

  • Reinforcement learning is a machine learning scheme involved in training an action policy to maximize the total reward in a particular situation or ­environment[5]

  • Various applications have been studied for reinforcement learning, such as autonomous driving ­vehicles[6], robot c­ ontrol[7], communication s­ ecurity[8], and elastic optical n­ etworks[9]

Read more

Summary

Introduction

Reinforcement learning has been intensively investigated and developed in artificial intelligence in the absence of training data, such as autonomous driving vehicles, robot control, internet advertising, and elastic optical networks. An algorithm based on a deep neural network (Agent57) has been proposed in 2­ 02010. This scheme has achieved a score that is above the human baseline on all 57 Atari 2600 games. Learning the connection weights of deep neural networks using reinforcement learning entails high computation costs because of the repeated training of network weights from vast playing d­ ata[14,15]. This fact indicates the need for a large number of parameters used for learning to improve the performance of deep neural networks, known as ­overparameterization[15–17]. The photonic implementation of reservoir computing based on the idea of photonic ­accelerators[29] can realize fast information processing with low learning ­costs[30–35].

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call