Abstract

Active Object Recognition (AOR) has been approached as an unsupervised learning problem, in which optimal trajectories for object inspection are not known and to be discovered by reducing label uncertainty or training with reinforcement learning. Such approaches suffer from local optima and have no guarantees of the quality of their solution. In this paper, we treat AOR as a Partially Observable Markov Decision Process (POMDP) and find near-optimal values and corresponding action-values of training data using Belief Tree Search (BTS) on the AOR belief Markov Decision Process (MDP). AOR then reduces to the problem of knowledge transfer from these action-values to the test set. We train a Long Short Term Memory (LSTM) network on these values to predict the best next action on the training set rollouts and experimentally show that our method generalizes well to explore novel objects and novel views of familiar objects with high accuracy. We compare this supervised scheme against guided policy search, and show that the LSTM network reaches higher recognition accuracy compared to the guided policy search and guided Neurally Fitted Q-iteration. We further look into optimizing the observation function to increase the total collected reward during active recognition. In AOR, the observation function is known only approximately. We derive a gradient-based update for the observation function to increase the total expected reward. We show that by optimizing the observation function and retraining the supervised LSTM network, the AOR performance on the test set improves significantly.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call