A Comprehensive Survey of Estimator Learning Automata and Their Recent Convergence Results

B John Oommen,Lei Jiao,Xuan Zhang

doi:10.1007/978-3-030-87049-2_2

Abstract

AbstractThe pre-cursor field to Reinforcement Learning is that of Learning Automata (LA). Within this field, Estimator Algorithms (EAs) can be said to be the state-of-the-art. Further, the subset of Pursuit Algorithms (PAs), discovered by Thathachar and Sastry [34, 39], were the pioneering schemes. This chapter contains a comprehensive survey of the various EAs, and the most recent convergence results for PAs. Unlike the prior LA, EAs are based on a fundamentally distinct phenomenon. They are also the most accurate LA, converging in the least time. EAs operate on two vectors, namely, the action probability vector which is updated using responses from the Environment, and quickly-computed estimates of the reward probabilities of the various actions. The proofs that they are \(\upvarepsilon \)-optimal is thus very complex. They have to incorporate two rather snon-orthogonal phenomena, which are the convergence of these estimates and the convergence of the probabilities of selecting the various actions. For almost three decades, the reported proofs of PAs possessed an infirmity (or flaw), which we refer to as the claim of the “monotonicity” property. This flaw was discovered by the authors of [37], who also provided an alternate proof for a specific PA where the scheme’s parameter decreased with time. This paper first records all the reported EAs. It then reports a comprehensive survey of the proofs from a different perspective. These proofs have not required that the sequence of action probabilities of selecting the optimal action satisfies the property of monotonicity. On the other hand, whenever any action probability is close enough to unity, we require that the process jumps to an absorbing barrier at the next time instant, i.e., in a single step. By requiring such a constraint, these proofs invoke the weaker property, i.e., the submartinagale property of \(p_m(t)\), to demonstrate the \(\upvarepsilon \)-optimality. We have thus proven the \(\upvarepsilon \)-optimality for the Absorbing CPA [49, 50], the Discretized PA [51, 52], and for the family of Bayesian PA [53], where the estimates are obtained by a Bayesian (rather than a Maximum Likelihood (ML)) process.KeywordsPursuit learning automata (LA)Martingale properties of LAConvergence proofs of LA

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

A Comprehensive Survey of Estimator Learning Automata and Their Recent Convergence Results

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

A formal proof of the ε-optimality of absorbing continuous pursuit algorithms using the theory of regular functions
Xuan Zhang ... Lei Jiao
Applied Intelligence | VOL. 41
Xuan Zhang, et. al.Xuan Zhang ... Lei Jiao
31 May 2014
Applied Intelligence | VOL. 41

Incorporation of Optimal Computing Budget Allocation for Ordinal Optimization Into Learning Automata
Junqi Zhang ... Cheng Wang
IEEE Transactions on Automation Science and Engineering | VOL. 13
Junqi Zhang, et. al.Junqi Zhang ... Cheng Wang
01 Apr 2016
IEEE Transactions on Automation Science and Engineering | VOL. 13

A Double Competitive Strategy-Based Learning Automata Algorithm
Chong Di ... Shenghong Li
-
Chong Di, et. al.Chong Di ... Shenghong Li
14 Jun 2019
14 Jun 2019

A formal proof of the 𝜖-optimality of discretized pursuit algorithms
Xuan Zhang ... Ole-Christoffer Granmo
Applied Intelligence | VOL. 44
Xuan Zhang, et. al.Xuan Zhang ... Ole-Christoffer Granmo
07 May 2015
Applied Intelligence | VOL. 44

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A Comprehensive Survey of Estimator Learning Automata and Their Recent Convergence Results

Abstract

Talk to us

Similar Papers