Abstract
How can we harness nature’s power for computations? Our society comprises a collection of individuals, each of whom handles decision-making tasks that are abstracted as computational problems of finding the most profitable option from a set of options that stochastically provide rewards. Society is expected to maximize the total rewards, while the individuals compete for common rewards. Such collective decision making is formulated as the “competitive multi-armed bandit problem (CBP).” Herein, we demonstrate an analog computing device that uses numerous fluids in coupled cylinders to efficiently solve CBP for the maximization of social rewards, without paying the conventionally-required huge computational cost. The fluids estimate the reward probabilities of the options for the exploitation of past knowledge, and generate random fluctuations for the exploration of new knowledge for which the utilization of the fluid-derived fluctuations is more advantageous than applying artificial fluctuations. The fluid-derived fluctuations, which require exponentially-many combinatorial efforts when they are emulated using conventional digital computers, would exhibit their maximal computational power when tackling classes of problems that are more complex than CBP. Extending the current configuration of the device would trigger further studies related to harnessing the huge computational power of natural phenomena to solve a wide variety of complex societal problems.
Highlights
IntroductionThe benefits to an organization (the whole) and those to its constituent members (parts) sometimes conflict
The benefits to an organization and those to its constituent members sometimes conflict
Our model tackles the multiplayer, multi-armed bandit problem, which considers situations that are different from conventional situations; in our model, (1) all the elements of a payoff matrix are the probabilities of which rewards are potentially obtained; (2) a player’s selection is made by referring to information accumulated through all past events; and (3) the “social maximum” that we are interested in does not always coincide with Nash equilibrium
Summary
The benefits to an organization (the whole) and those to its constituent members (parts) sometimes conflict. Let us consider a situation wherein traffic congestion is caused by a driver making a selfish decision to pursue his/her individual benefit to quickly arrive at a destination. Marden et al proposed payoff-based dynamics for multiplayer weakly acyclic games [4], which focused on Nash equilibrium achieved through a Markovian process. Our model tackles the multiplayer, multi-armed bandit problem, which considers situations that are different from conventional situations; in our model, (1) all the elements of a payoff matrix are the probabilities of which rewards are potentially obtained; (2) a player’s selection is made by referring to information accumulated through all past events; and (3) the “social maximum” that we are interested in does not always coincide with Nash equilibrium. We demonstrate a method for exploiting the computational power of the physical dynamics of numerous fluids in coupled cylinders to efficiently solve the problem
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.