Abstract

Recent empirical success has led to a rise in popularity of the options framework for Hierarchical Reinforcement Learning (HRL). This framework tackles the scalability problem in Reinforcement Learning (RL) by introducing a layer of abstraction (i.e. high-level options) over the (low-level) decision process. Hierarchical Imitation Learning (HIL) is the problem of learning low-level and high-level policies within HRL from expert demonstrations consisting only of the low-level actions and states, with the high-level options being hidden (or latent). Due to the latent options, recent work on HIL has focused on the development of Expectation-Maximization (EM) algorithms inspired by approaches such as the celebrated Baum-Welch algorithm for hidden Markov models (HMMs). In this work, we take a different approach and derive a new HIL framework inspired by the spectral method of moments for HMMs. The method of moments offers global and consistent convergence under mild regulatory conditions, whilst only requiring one sweep through the data set of state and action pairs, giving it a competitive run time.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call