Abstract

We investigate the effect of a memory parameter on the performance of adaptive decision making using a tug-of-war method with the chaotic oscillatory dynamics of a semiconductor laser. We experimentally generate chaotic temporal waveforms of the semiconductor laser with optical feedback and apply them for adaptive decision making in solving a multiarmed bandit problem that aims at maximizing the total reward from slot machines whose hit probabilities are dynamically switched. We examine the dependence of making correct decisions on different values of the memory parameter. The degree of adaptivity is found to be enhanced with a smaller memory parameter, whereas the degree of convergence to the correct decision is higher for a larger memory parameter. The relations among the adaptivity, environmental changes, and the difficulties of the problem are also discussed considering the requirement of past decisions. This examination of ultrafast adaptive decision making highlights the importance of memorizing past events and paves the way for future photonic intelligence.

Highlights

  • Artificial intelligence based on deep learning, as a type of supervised learning, has been rapidly deployed in society

  • The average hit rate (AHR) represents the total reward acquisition rate that is defined as the ratio of the number of “hits” to the total number of trials and cycles

  • We investigated adaptive decision making based on the TOW method using the temporal waveforms of a chaotic semiconductor laser

Read more

Summary

Introduction

Artificial intelligence based on deep learning, as a type of supervised learning, has been rapidly deployed in society. Reinforcement learning is another branch of machine learning that involves trial-and-error processes to accommodate unknown agents in environments [1, 2]. The multiarmed bandit (MAB) problem is a fundamental problem in reinforcement learning wherein the total reward (e.g., the number of coins) from multiple slot machines with unknown hit probabilities needs to be maximized [1, 2, 5]. In order to solve the MAB problem, it is important to estimate the slot machine that may exhibit the highest hit probability by playing a series of slot machines (called exploration) and to use the estimation to gain more rewards (called exploitation). The trade-off of the explorationexploitation dilemma [1, 2] has been known in the MAB problem

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call