A parametric texture model based on deep convolutional features closely matches texture appearance for humans

Thomas S A Wallis,Alexander S Ecker,Leon A Gatys,Felix A Wichmann,Christina M Funke,Matthias Bethge

doi:10.1167/17.12.5

Thomas S A Wallis, Alexander S Ecker + Show 4 more

Open Access

https://doi.org/10.1167/17.12.5

Copy DOI

Abstract

Our visual environment is full of texture-"stuff" like cloth, bark, or gravel as distinct from "things" like dresses, trees, or paths-and humans are adept at perceiving subtle variations in material properties. To investigate image features important for texture perception, we psychophysically compare a recent parametric model of texture appearance (convolutional neural network [CNN] model) that uses the features encoded by a deep CNN (VGG-19) with two other models: the venerable Portilla and Simoncelli model and an extension of the CNN model in which the power spectrum is additionally matched. Observers discriminated model-generated textures from original natural textures in a spatial three-alternative oddity paradigm under two viewing conditions: when test patches were briefly presented to the near-periphery ("parafoveal") and when observers were able to make eye movements to all three patches ("inspection"). Under parafoveal viewing, observers were unable to discriminate 10 of 12 original images from CNN model images, and remarkably, the simpler Portilla and Simoncelli model performed slightly better than the CNN model (11 textures). Under foveal inspection, matching CNN features captured appearance substantially better than the Portilla and Simoncelli model (nine compared to four textures), and including the power spectrum improved appearance matching for two of the three remaining textures. None of the models we test here could produce indiscriminable images for one of the 12 textures under the inspection condition. While deep CNN (VGG-19) features can often be used to synthesize textures that humans cannot discriminate from natural textures, there is currently no uniformly best model for all textures and viewing conditions.

Highlights

Textures are characterised by the repetition of smaller elements, sometimes with variation, to make up a pattern
Together with the results of Balas (2006), these results suggest that the Portilla and Simoncelli model (PS) model better captures texture appearance in the periphery than in the fovea, and that the perceptual fidelity of the matching depends on the image or texture type
More complex convolutional neural network (CNN) models tend to produce poorer psychophysical performance, and the performance in the parafoveal condition is poorer than the inspection condition

Summary

Introduction

Textures are characterised by the repetition of smaller elements, sometimes with variation, to make up a pattern. Significant portions of the visual environment can be thought of as textures (“stuff” as distinct from “things”; Adelson & Bergen, 1991): your neighbour’s pink floral wallpaper, the internal structure of dark German bread, the weave of a wicker basket, the gnarled bark of an old tree trunk, a bowl full of prawns ready for the barbie. Texture is an important material property whose perception is of adaptive value (Adelson, 2001; Fleming, 2014). Ho, Landy, & Maloney, 2008), separating the underlying spatial texture from potentially temporary characteristics like glossiness. Given the importance and ubiquity of visual textures, it is little wonder that they have received much scientific attention, from within vision science and in computer vision, graphics and art (see Dakin, 2014; Landy, 2013; Pappas, 2013; Rosenholtz, 2014, for comprehensive recent reviews of this field)

Methods

Results

Conclusion