Intuitively editing the appearance of materials, just from a single image, is a challenging task given the complexity and ambiguity of the interactions between light and matter. This problem has been traditionally solved by estimating additional factors of the scene like geometry or illumination, thus solving an inverse rendering problem where the interaction of light and matter needs to be modelled. Instead, we present a single-image appearance editing framework that allows to intuitively modify the material appearance of an object by increasing or decreasing high-level perceptual attributes describing appearance (e.g., glossy or metallic). Our framework uses just an in-the-wild image as input, where geometry or illumination are not controlled.
 We rely on generative neural networks and, in particular, on Selective Transfer Generative Adversarial Networks (STGAN) that allow to preserve high-frequency details from the input image in the edited one. To train our framework we combine pairs of synthetic images, rendered with physically-based ray tracing algorithms, and their corresponding ratings of the high-level attributes, given by humans through crowd-sourced user studies. Last, although trained on synthetic images, we demonstrate the applicability of our method on synthetic video sequences; and real-world photographs downloaded from online catalogs and manually taken using our mobile phones.
Read full abstract