Biologically inspired image classifier based on saccadic eye movement design for convolutional neural networks

Sweta Kumari,V Srinivasa Chakravarthy

doi:10.1016/j.neucom.2022.09.027

Abstract

We propose a model for image classification by attentional search. Analogous to how humans scan an image by a sequence of saccades, in this model, an attentional window of size much smaller than the target size scans the target by a sequence of “saccades”, integrates the information acquired, and makes a classification decision. In order to process a sequence of attended image segments, the network must have memory, which is incorporated through 3 kinds of recurrent elements in the network architecture: Elman connections, Jordan connections, and Flip-flop neurons. The architecture of the model is designed as three separate channels labeled as – classifier network, eye-position network, and saccade network. Multiple attentional windows with different resolutions and a common center are given as input to the classifier network and the saccade network. The heat-map representation of the location of the attentional windows is given as input to the eye-position network. The saccade network predicts the next jump of the attention windows with the help of reward signals received by the classifier network. The output features of all the three channels are concatenated, before finally terminating in two output layers representing class prediction and next saccade prediction. The model is trained using deep Q-learning algorithm. Attentional search model is evaluated on MNIST handwritten digit, Kannada MNIST, Medical-MNIST, OCTMNIST, and QuickDraw datasets. Translated and Cluttered Translated versions of each dataset are generated to perform the task of classification based on local target search. Original datasets are used to show the task of classification based on search with global target integration. We also evaluate the saccade performance on Extended Yale Face B database. In various problem cases, the model exhibits comparable or superior performance to a state-of-the-art recurrent attention model. Demo code is available in this link.

Full Text