Multiple response learning automata

A.A Economides

doi:10.1109/3477.484448

Abstract

Learning Automata update their action probabilites on the basis of the response they get from a random environment. They use a reward adaptation rate for a favorable environment's response and a penalty adaptation rate for an unfavorable environment's response. In this correspondence, we introduce Multiple Response learning automata by explicitly classifying the environment responses into a reward (favorable) set and a penalty (unfavorable) set. We derive a new reinforcement scheme which uses different reward or penalty rates for the corresponding reward (favorable) or penalty (unfavorable) responses. Well known learning automata, such as the L(R-P);L(R-I); L(R-eP) are special cases of these Multiple Response learning automata. These automata are feasible at each step, nonabsorbing (when the penalty functions are positive), and strictly distance diminishing. Finally, we provide conditions in order that they are ergodic and expedient.

Full Text