Abstract

Abstract There is an ever-increasing need for autonomous robots that are capable of adapting to and operating in challenging partially-observable and stochastic environments. Standard techniques for autonomous learning in such environments are often fundamentally reliant on human-engineered features, one of the most important of which is an a priori specification of the agent’s state space. Designing an appropriate state space demands extensive domain knowledge, and even minor changes to the task or the agent might necessitate re-engineering. These limitations have given rise to end-to-end, predictive approaches, such as Predictive State Representations (PSRs) and our Stochastic Distinguishing Experiments (SDEs), that learn a representation of state encoded in the probabilities of key sequences of raw actions and observations (i.e., experiments the agent can perform). Discovering these experiments remains a key challenge, in part because existing techniques lack a formal relationship between predictive experiments and latent states in the agent’s model of its environment. In this paper, we extend our SDE representation into a novel hybrid latent-predictive cognitive architecture in which each latent state is created and uniquely represented by the result of a predictive experiment that statistically distinguishes it from other states. We prove that deterministic environments and a useful subclass of POMDP environments can be perfectly represented with equivalent compactness by such models and provide an active algorithm for autonomously learning such models in unknown environments from experience based on the biologically-inspired notion of surprises. The agent begins using only its observations as a state space and splits those states into a hierarchy of additional latent states when it is surprised by high entropy resulting from repeatedly executing experiments that are automatically designed and selected to statistically disambiguate identical-looking states. We present experimental results demonstrating the feasibility of this learning procedure.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call