Abstract

Local image descriptors play a crucial role in many image processing tasks, such as object tracking, object recognition, panorama stitching, and image retrieval. In this paper, we focus on learning local image descriptors in an unsupervised way, using autoencoders and variational autoencoders. We perform a thorough comparative analysis of these two approaches along with an in-depth analysis of the most relevant hyperparameters to guide their optimal selection. In addition to this analysis, we give insights into the difficulties and the importance of selecting right evaluation techniques during the unsupervised learning of the local image descriptors. We explore the extent to which a simple perceptual metric during training can predict the performance on tasks such as patch matching, retrieval and verification. Finally, we propose an improvement to the encoder architecture that yields significant savings in memory complexity, especially in single-image tasks. As a proof of concept, we integrate our descriptor into an inpainting algorithm and illustrate its results when applied to the virtual restoration of master paintings. The source code required to reproduce the presented results has been made available as a repository on GitHub (<uri>https://github.com/nimpy/local-img-descr-ae</uri>).

Highlights

  • F INDING a compact representation of a small patch in an image, i.e., finding a local image descriptor, is a crucial building block of various image processing tasks

  • To the best of our knowledge, a comprehensive comparison between the two methods does not exist in the literature, neither for learning local image descriptors, nor a general study. We offer such a thorough comparison between the autoencoders and variational autoencoders approach for learning local image descriptors

  • The MS-Structural Similarity Index (SSIM) loss function together with the Rectified Linear Unit (ReLU) activation yields the best descriptors in terms of performance on all tasks and for both autoencoders and variational autoencoders, and, in case of VAEs, this holds for different values of β parameter

Read more

Summary

INTRODUCTION

F INDING a compact representation of a small patch in an image, i.e., finding a local image descriptor, is a crucial building block of various image processing tasks. We proposed in our previous work an approach for learning local image descriptors based on convolutional autoencoders [20], [21] and variational autoencoders [22] In our experience, both AEs and VAEs have shown promising results for learning local image descriptors, a thorough comparative analysis is still missing. Encouraged by the recent result from [38], who demonstrated benefits from using a perceptual loss function with the autoencoders that learn embeddings for downstream prediction tasks, we include perceptual loss into our analysis of AEbased local image descriptors This proved to lead to an improved performance of the descriptors. To the best of our knowledge, we are the first to use a perceptual loss when training autoencoders to learn a local image descriptor

AUTOENCODERS
OVERVIEW OF THE HYPERPARAMETERS
THE EXPERIMENTAL SETUP
EMPIRICAL RESULTS FOR HYPERPARAMETER SELECTION
EVALUATING LOCAL IMAGE DESCRIPTORS
EMPIRICAL RESULTS FOR EVALUATION METRICS
REDUCING COMPUTATIONAL MEMORY USING INTERMEDIATE REPRESENTATION
EVALUATING THE PROPOSED IR ARCHITECTURE
INPAINTING – PROOF OF CONCEPT FOR THE PROPOSED ARCHITECTURE
CONCLUSION
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call