Abstract

The actor-critic (AC) algorithm is a class of important reinforcement learning (RL) methods commonly used in continuous MDPs. However, few of its variants concern in sample efficiency. This paper proposed a AC variant, called AC-DPML, aiming at handling RL problems by combining AC with dual piecewise model learning and planning. Dual piecewise model learning stands for learning two models, the state-based piecewise model and the action-based piecewise model. The state-based piecewise model is established according to the division of the state space, while the action-based piecewise model is built depending on the division of the action space. Both models are linearly approximated and learned by the samples attributed to. The planning processes of the two models are launched only if the prediction errors do not exceed the error threshold. The value function and the policy are further updated by the planning of the state-based piecewise model, but only the value function is updated in that of the action-based piecewise model. Experimentally, AC-DPML is implemented in two classic RL benchmarks with continuous MDPs. The results demonstrate that AC-DPML coordinates the two models perfectly and it outperforms the representative methods in terms of convergence rate and sample efficiency.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call