Abstract

Learning automata (LA) represent important leaning mechanisms with applications in automated system design, biological system modeling, computer vision, and transportation. They play the critical roles in modeling a process as well as generating the appropriate signal to control it. They update their action probabilities in accordance with the inputs received from the environment and can improve their own performance during operations. The action probability vector in LA takes charge of two functions: 1) The cost of convergence, i.e., the size of sampling budget; 2) The allocation of sampling budget among actions to identify the optimal one. These two intertwined functions lead to a problem: The sampling budget mostly goes to the currently estimated optimal action due to its high action probability regardless whether it can help identify the real optimal action or not. This work proposes a new class of LA that separates the allocation of sampling budget from the action probability vector. It uses the action probability vector to determine the size of sampling budget and then uses Optimal Computing Budget Allocation (OCBA) to accomplish the allocation of sampling budget in a way that maximizes the probability of identifying the true optimal action. Simulation results verify its significant speedup ranging from 10.93% to 65.94% over the best existing LA algorithms.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call