Abstract

Modeling of natural images is an urgent issue in the field of computer vision. Generative flow-based models, which use maximum likelihood estimation to learn the distribution of multidimensional vectors, provide a flexible and scalable architecture for image description. Such models have high computational efficiency, and also use an internal representation of images that allows changing specific features of the generated data. However, flow-based models, as a rule, have much worse quality of the generated images in comparison with modern autoregressive models. The article proposes a probabilistic flow-based model, the main idea of which involves the consistent use of a combination of different pairs of binary masks at multiple levels of spatial resolution. The innovation improves the search for both local and global features of images. The proposed architecture is easily scaled, which allows the model to be used to generate images of various sizes, including high-resolution images, without loss of learning stability. The model’s performance was demonstrated by four datasets of images with different resolutions, significant improvements were achieved on standard metrics for evaluating the quality of generated images, reducing the efficiency gap between flow-based and autoregressive models

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call