Deep Neural Networks as a computational model for early vision: Lateral masking and contour integration

Yoram Bonneh,Ron Dekel

doi:10.1167/jov.20.11.1356

Yoram Bonneh, Ron Dekel

Open Access

https://doi.org/10.1167/jov.20.11.1356

Copy DOI

Abstract

Background: The deep neural network (DNN) models developed for image classification have been recently suggested as biologically inspired models for different brain functions. Here we apply a standard visual DNN model to explore early-vision mechanisms of integration, as reflected in the psychophysical phenomena of lateral masking, contrast summation, and contour integration. Methods: We used the standard ImageNet-trained VGG network model. The model correlate of perceptual distance was the L2 distance between the average response to images corresponding to different stimuli conditions. Higher values of the metric (indicating larger changes in the DNN representation) correspond to better discrimination. Results: For lateral masking, the model produced a close match to the basic behavioral data (Polat & Sagi 1993), with a facilitation of ~0.4 log units at 2.5 wave-length distance, as well as: (1) inhibition for very close flankers, (2) no facilitation for orthogonal flankers and decreased facilitation with deviation from collinearity, (3) more facilitation of a vertical configuration compared to horizontal and oblique, (4) scaling of the effects with wave-length, and more. These results were obtained with a growing facilitation and longer interaction range from the mid layers of the model and up, indicating hierarchical integration that possibly substitutes for the assumed lateral interactions in V1. For contrast summation we replicated the configuration dependent (smooth but not jagged) summation (response vs number of patches along a Gabor contour) power-law (Bonneh and Sagi 1998). For Gabor contour detection in noise (Field et. al. 1993) the model showed the known effects of spacing, contour smoothness, and scaling. For noise detection in natural images (Alam et al. 2014), perceptual thresholds were strongly correlated with model predictions (R=0.78, N=1080 images). Conclusions: These findings demonstrate effortless replication in a DNN of classic findings concerning early human visual processing, suggesting convergent evolution of biological and artificial vision.

Full Text