Abstract

Learning visual patterns is a key and fundamental building block for visual perception towards machine intelligence. By definition, a visual pattern is a discernable visual regularity in the world, whose compositional elements as a whole, repeat in a predictable manner. Correspondingly, there are three fundamental tasks in pattern analysis, i.e., pattern recognition, pattern discovery, and pattern synthesis. In this chapter, we devote our discussions to leveraging recent advancement of deep generative models, such as variational auto-encoders (VAEs) and generative adversarial networks (GANs), to achieve more controllable (besides random sampling) synthesis of visual patterns. Compared to traditional statistical models such as Markov Random Fields, these deep learning-based methods target end-to-end pattern modeling by formulating it as a conditional image generation problem. We argue that the key to achieve controllable pattern modeling is to learn disentangled representations for the visual patterns, where various controlling factors can be explicitly pin-pointed in the embedded visual representation. We will start with the basics of generative modeling, followed by a detailed introduction of modern deep generative models for learning visual patterns. Then, we will use several case studies, in style transfer, vision-language generation and face synthesis, to illustrate how we may achieve disentangled representations for controllable visual modeling and synthesis in a weakly supervised fashion. In the end, we conclude with some discussions on how this may guide us to build more explainable deep learning-based models for visual perception.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call