GAN-Based Image Colorization for Self-Supervised Visual Feature Learning.

Sandra Treneska,Eftim Zdravevski,Petre Lameski,Sonja Gievska,Ivan Miguel Pires

doi:10.3390/s22041599

Abstract

Large-scale labeled datasets are generally necessary for successfully training a deep neural network in the computer vision domain. In order to avoid the costly and tedious work of manually annotating image datasets, self-supervised learning methods have been proposed to learn general visual features automatically. In this paper, we first focus on image colorization with generative adversarial networks (GANs) because of their ability to generate the most realistic colorization results. Then, via transfer learning, we use this as a proxy task for visual understanding. Particularly, we propose to use conditional GANs (cGANs) for image colorization and transfer the gained knowledge to two other downstream tasks, namely, multilabel image classification and semantic segmentation. This is the first time that GANs have been used for self-supervised feature learning through image colorization. Through extensive experiments with the COCO and Pascal datasets, we show an increase of 5% for the classification task and 2.5% for the segmentation task. This demonstrates that image colorization with conditional GANs can boost other downstream tasks’ performance without the need for manual annotation.

Highlights

It has been shown that deep convolutional neural networks (CNNs) have a great ability to learn features from visual data
The code for training and evaluating the image colorization generative adversarial networks (GANs) is publicly available in the following repository [48], and the code for transferring the gained knowledge to other downstream tasks is available at [49]
We explored image colorization as a proxy task for visual understanding for the first time

Summary

Introduction

It has been shown that deep convolutional neural networks (CNNs) have a great ability to learn features from visual data. Because of this, they have been consistently used as a base for many computer vision tasks such as image classification [1], object detection [2,3], semantic and instance segmentation [4,5], image captioning [6], and more. The success of deep learning models greatly depends on the amount of data used for their training. Annotating medical images requires the contribution of an expert in the field Another example domain is image segmentation, where creating annotations is especially tedious manual work

Methods

Results

Discussion

Conclusion