Abstract

Traditional Learning Automata (LA) work with the understanding that the actions are chosen purely based on the “state” in which the machine is. This modus operandus completely ignores any estimation of the Random Environment’s (RE’s) (specified as \(\mathbb {E}\)) reward/penalty probabilities. To take these into consideration, Estimator/Pursuit LA utilize “cheap” estimates of the Environment’s reward probabilities to make them converge by an order of magnitude faster. This concept is quite simply the following: Inexpensive estimates of the reward probabilities can be used to rank the actions. Thereafter, when the action probability vector has to be updated, it is done not on the basis of the Environment’s response alone, but also based on the ranking of these estimates. While this phenomenon has been utilized in the field of LA, until recently, it has not been incorporated into solutions that solve partitioning problems. In this paper (The second author gratefully acknowledges the partial support of NSERC, the Natural Sciences and Engineering Council of Canada), we will submit a complete survey of how the “Pursuit” learning paradigm can be and has been used in Object Partitioning. The results demonstrate that incorporating this paradigm can hasten the partitioning by a order of magnitude.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call