Intrinsic Image Decomposition Using Paradigms.

David Forsyth,Jason J Rock

doi:10.1109/tpami.2021.3119551

Abstract

Intrinsic image decomposition is the task of mapping image to albedo and shading. Classical approaches derive methods from spatial models. The modern literature stresses evaluation, by comparing predictions to human judgements ("lighter", "same as", "darker"). The best modern intrinsic image methods train a map from image to albedo using images rendered from computer graphics models and example human judgements. This approach yields practical methods, but obtaining rendered images can be inconvenient. Furthermore, the approach cannot explain how a one could learn to recover intrinsic images without geometric, surface and illumination models, as people and animals appear to do. This paper describes a method that learns intrinsic image decomposition without seeing human annotations, rendered data, or ground truth data. Instead, the method relies on paradigms - spatial models of albedo and of shading. Rather than finding the "best" albedo and shading for an image via optimization, our approach trains a neural network on synthetic images. The synthetic images are constructed by multiplying albedos and shading fields sampled from our models. The network is subject to a novel smoothing procedure that ensures good behavior at short scales on real images. An averaging procedure ensures that reported albedo and shading are largely equivariant - different crops and scalings of an image will report the same albedo and shading at shared points. This averaging procedure controls long scale error. The standard evaluation for an intrinsic image method is a WHDR score. Our method achieves WHDR scores competitive with those of strong recent methods allowed to see training WHDR annotations, rendered data, and ground truth data. Our method produces albedo and shading maps with attractive qualitative properties - for example, albedo fields do not suppress wood grain and represent narrow grooves in surfaces well. Because our method is unsupervised, we can compute estimates of the test/train variance of WHDR scores; these are quite large, and suggest is unsafe to rely small differences in reported WHDR.

Full Text