Joint Learning of Binocularly Driven Saccades and Vergence by Active Efficient Coding.

Qingpeng Zhu,Bertram E Shi,Jochen Triesch

doi:10.3389/fnbot.2017.00058

Abstract

This paper investigates two types of eye movements: vergence and saccades. Vergence eye movements are responsible for bringing the images of the two eyes into correspondence, whereas saccades drive gaze to interesting regions in the scene. Control of both vergence and saccades develops during early infancy. To date, these two types of eye movements have been studied separately. Here, we propose a computational model of an active vision system that integrates these two types of eye movements. We hypothesize that incorporating a saccade strategy driven by bottom-up attention will benefit the development of vergence control. The integrated system is based on the active efficient coding framework, which describes the joint development of sensory-processing and eye movement control to jointly optimize the coding efficiency of the sensory system. In the integrated system, we propose a binocular saliency model to drive saccades based on learned binocular feature extractors, which simultaneously encode both depth and texture information. Saliency in our model also depends on the current fixation point. This extends prior work, which focused on monocular images and saliency measures that are independent of the current fixation. Our results show that the proposed saliency-driven saccades lead to better vergence performance and faster learning in the overall system than random saccades. Faster learning is significant because it indicates that the system actively selects inputs for the most effective learning. This work suggests that saliency-driven saccades provide a scaffold for the development of vergence control during infancy.

Highlights

Biological vision systems are often active and rely on a number of eye movements to sense the environment
Binocular Saliency Model The binocular saliency map is generated using binocular attention based on information maximization (BAIM), which we propose as a binocular extension of the AIM model of Bruce and Tsotsos (2009)
Are empirical estimates of the probability that the response falls into the k-th bin computed over Psal patches. We considered both global binocular attention based on information maximization (GBAIM) and local binocular attention based on information maximization (LBAIM) versions of the saliency map, which differed according to patches used to estimate the coefficients hn(k) in Eq 17

Summary

Introduction

Biological vision systems are often active and rely on a number of eye movements to sense the environment. These vision systems have the ability to autonomously self-calibrate, but the underlying mechanisms are still poorly understood. Vergence eye movements are slow and disconjugate (the two eyes move in opposite directions). They serve to align the images acquired by the two eyes so that they can be binocularly fused. Saccadic eye movements are rapid and conjugate (the two eyes move in the same direction) They serve to direct gaze so that the fovea, the region with highest visual acuity, falls on objects of interest.

Methods

Results

Conclusion