Greedy Action Research Articles

One most common problem in reinforcement learning systems (e.g. Q-learning) is to reduce the number of trials to converge to an optimal policy. As one of the solution to the problem, k-certainty exploration method was proposed. Miyazaki reported that this method could determine an optimal policy faster than Q-learning in Markov decision processes (MDPs). This method is very efficient learning method. But, we propose an improvement plan that makes this method more efficient. In k-certainty exploration method, in case there is no k-uncertainty rule (rules which aren’t selected more than k times) in current state, an agent sometimes walks randomly until it finds a state where it can select a k-uncertainty rule. We think that this behavior is not efficient. To reduce this useless behavior, we propose combining k-certainty exploration method with Dynamic programming (DP). Miyazaki’s system uses DP after all rules are executed at least k times. But, our method uses k-certainty exploration method along with DP during learning. Our method, takes two pattern actions. In case an agent can select k-uncertainty rules, one of these rules is selected at random as k-certainty exploration method. In another case there is no k-uncertainty rule, behavior of agent is different from the behavior of k-certainty exploration method. In that case, our method uses DP to compute an optimal policy for moving from a current state to a state in which some k-uncertainty rules remain. The model for DP is constructed by using only known states. The outline will be described below. First, an agent makes a map constructed by using only known states. In the map, goals are states in which there are k-uncertainty rules and arbitrary state values are set in these states. Point to which attention should be paid is that the map is not given from outside. It is made from only experience. Next, each state values of states in which there are only k-certainty rules in the map are computed by DP (we used Policy Iteration). Finally, an agent continues to select greedy action until it arrives at a state in which it can select a k-uncertainty rule. By this improvement, we expect it can determine an optimal policy faster than k-certainty exploration method. And we have verified that our exploration method can determine an optimal policy faster than k-certainty exploration method by computer simulation.

Reinforcement Learning (RL) is the study of programs that improve their performance by receiving rewards and punishments from the environment. Most RL methods optimize the discounted total reward received by an agent, while, in many domains, the natural criterion is to optimize the average reward per time step. In this paper, we introduce a model-based Averagereward Reinforcement Learning method called H-learning and show that it converges more quickly and robustly than its discounted counterpart in the domain of scheduling a simulated Automatic Guided Vehicle (AGV). We also introduce a version of H-learning that automatically explores the unexplored parts of the state space, while always choosing greedy actions with respect to the current value function. We show that this “Auto-exploratory H-Learning” performs better than the previously studied exploration strategies. To scale H-learning to larger state spaces, we extend it to learn action models and reward functions in the form of dynamic Bayesian networks, and approximate its value function using local linear regression. We show that both of these extensions are effective in significantly reducing the space requirement of H-learning and making it converge faster in some AGV scheduling tasks.

Greedy Action Research Articles

Articles published on Greedy Action

A cooperative learning approach to Mixed Performance Controller design: a behavioural viewpoint

An Efficient Exploration Method Using k-Certainty Exploration Method and Dynamic Programming under Markov Decision Processes

Model-based average reward reinforcement learning

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Greedy Action Research Articles

Articles published on Greedy Action

A cooperative learning approach to Mixed Performance Controller design: a behavioural viewpoint

An Efficient Exploration Method Using k-Certainty Exploration Method and Dynamic Programming under Markov Decision Processes

Model-based average reward reinforcement learning