Reinforcement learning combined with human feedback in continuous state and action spaces

Ngo Anh Vien Ngo Anh Vien,Wolfgang Ertel

doi:10.1109/devlrn.2012.6400849

Ngo Anh Vien Ngo Anh Vien, Wolfgang Ertel

https://doi.org/10.1109/devlrn.2012.6400849

Copy DOI

Export

Save

Cite

Abstract
Full-Text
Similar Papers

Abstract

Listen

We consider the problem of extending manually trained agents via evaluative reinforcement (TAMER) in continuous state and action spaces. The early work TAMER framework allows a non-technical human train an agent through a natural form of human feedback, negative or positive. The advantages of TAMER have been shown on applications such as training Tetris and Mountain Car with only human feedback, Cart-pole and Mountain Car with human feedback and environment reward (augmenting reinforcement learning with human feedback). However, those methods are originally designed for discrete state-action, or continuous state-discrete action problems. We propose an extension of TAMER to allow both continuous states and actions, called ACTAMER. The new framework extends the original TAMER to allow using any general function approximation of a human trainer's reinforcement signal. Moreover, we investigate a combination capability of the ACTAMER and reinforcement learning (RL). The combination of human feedback and RL is studied in both settings: sequential and simultaneous. Our experimental results show the proposed method successfully allowing a human to train an agent in two continuous state-action domains: Mountain Car, Cart-pole (balancing).

Full Text