Abstract

Considering the complexity of a multimedia society and the subjective task of describing images with words, a visual search application is a valuable tool. This work implements a Content-Based Image Retrieval (CBIR) application for texture images with the goal of comparing three deep convolutional neural networks (VGG-16, ResNet-50, and DenseNet-161), used as image descriptors by extracting global features from images. For measuring similarity among images and ranking them, we employed cosine similarity, Manhattan distance, Bray-Curtis dissimilarity, and Canberra distance. We confirm that global average pooling applied to convolutional layers provides good texture descriptors, and propose to use it when extracting features from VGGbased models. Our best result uses the average pooling layer from DenseNet-161 as a 2208-dim feature vector along with Bray-Curtis dissimilarity. We achieved 73:09% mAP@1 and 76:98% mAP@5 on the Describable Textures Dataset (DTD) benchmark, adapted for image retrieval. Our mAP@1 result is comparable to the state-of-the-art classification accuracy (73:8%). We also investigate the impact on retrieval performance when reducing the number of feature components with PCA. We are able to compress a 2208-dim descriptor down to 128 components with a moderate 3.3 percentage points drop in mAP@1.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call