Abstract

What motivates an action in the absence of a definite reward? Taking the case of visuomotor control, we consider a minimal control problem that is how select the next saccade, in a sequence of discrete eye movements, when the final objective is to better interpret the current visual scene. The visual scene is modeled here as a partially-observed environment, with a generative model explaining how the visual data is shaped by action. This allows to interpret different action selection metrics proposed in the literature, including the Salience, the Infomax and the Variational Free Energy, under a single information theoretic construct, namely the view-based Information Gain. Pursuing this analytic track, two original action selection metrics named the Information Gain Lower Bound (IGLB) and the Information Gain Upper Bound (IGUB) are then proposed. Showing either a conservative or an optimistic bias regarding the Information Gain, they strongly simplify its calculation. An original fovea-based visual scene decoding setup is then proposed, with numerical experiments highlighting different facets of artificial fovea-based vision. A first and principal result is that state-of-the-art recognition rates are obtained with fovea-based saccadic exploration, using less than 10% of the original image's data. Those satisfactory results illustrate the advantage of mixing predictive control with accurate state-of-the-art predictors, namely a deep neural network. A second result is the sub-optimality of some classical action-selection metrics widely used in the literature, that is not manifest with finely-tuned inference models, but becomes patent when coarse or faulty models are used. Last, a computationally-effective predictive model is developed using the IGLB objective, with pre-processed visual scan-path read-out from memory, bypassing computationally-demanding predictive calculations. This last simplified setting is shown effective in our case, showing both a competing accuracy and a good robustness to model flaws.

Highlights

  • In complement with goal-oriented activity, animal motor control relates to the search for sensory cues in order to better interpret its sensory environment and improve action efficacy

  • We take here benefit of the viewpoint-based variational encoding setup to propose a new quantification the mutual information shared across different sensory fields, locally estimated with a view-based Information Gain metric

  • View-based mutual information and information gain The sharing of information between two sensory fields x|u and x′|u′ should be quantified by their Mutual Information

Read more

Summary

INTRODUCTION

In complement with goal-oriented activity, animal motor control relates to the search for sensory cues in order to better interpret its sensory environment and improve action efficacy. The “maximum effect” principle encourages actions that are well discriminated, i.e., that have a visible effect on the sensors This is formally quantified by the “empowerment” information gain objective (Klyubin et al, 2005; Tishby and Polani, 2011), or by the more informal measures of surprise, like the “Salience” metric (Itti and Baldi, 2005), or the different “curiosity” metrics, like the ones proposed in Schmidhuber (1991), Oudeyer and Kaplan (2008), and Pathak et al (2017). An actual implementation of a sequential foveabased scene decoding setup is developed in section 3.2, allowing to quantitatively compare those different metrics, and propose new avenues toward parsimonious active vision through computationally-effective model-based prediction

PRINCIPLES AND METHODS
A Mixed Generative Model
Active Vision and Predictive Control
Accuracy-Based Action Selection
View-Based Information Gain Metrics
Fovea-Based Visual Scene Decoding
Metrics Comparison
CONCLUSION
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call