Abstract

Fixed Structure Stocastic Automata (FSSA), Variable Structure Learning Automata (VSSA), and their discretized versions have been significantly improved by utilizing inexpensive estimates of the actions' reward probabilities. These represent the fastest LA to date. However, the concept of ordering the actions has never been used within the field, and the reason for this is that there is no way to order the actions a priori. The recently-introduced Hierarchical Discrete Pursuit Automaton (HDPA) has an interesting concept of placing two-action LA along the nodes of a tree, implying that the leaves signify an underlying ordering. In this paper, we show that if estimates are available (as in the case of estimator algorithms), these can be used to place the actions at the leaf level to further enhance the convergence capabilities of the overall ensemble of the two-action LAs. This paper contains the design of this HDPA, the proof of this assertion, and the simulation results on benchmark Environments. Based on the results, we believe that it is the fastest and most accurate LA to date. Our position is that it will be very hard to beat its performance, since it has been incorporated all the salient features of the entire field of LA.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call