Abstract

Bag-of-Deep-Visual-Words (BoDVW) model has shown its advantage over Convolutional Neural Network (CNN) model in image classification tasks with a small number of training samples. An essential step in BoDVW model is to extract deep features by using an off-the-shelf CNN model as a feature extractor. Two deep feature extraction methods have been raised in recent years. The first method densely samples multi-scale image patches and then converts them into deep features via a deep-level fully-connected layer. The second method uses the output of a deep-level convolutional layer or pooling layer as the source of deep features. By contrast, the second method is much more efficient. However, it performs worse than the first method in classification accuracy. The reason is that deep features extracted by the second method are yielded in receptive fields of a single size. To make BoDVW model have high feature extraction efficiency and high classification accuracy, we propose enhancing deep features extracted by the second method at low added computation costs by supplementing the information obtained from receptive fields of different sizes. Concretely, we raise a novel feature named “feature difference (FD) vector” in this article. It can roughly preserve the information of multiple deep features extracted by the convolutional layers of different receptive field sizes. Each deep feature is enhanced by combining an FD vector to form a combined feature. The image representation vector of an image is generated using the combined features extracted from it. Our experimental results on three public datasets (15-Scenes, TF-Flowers, and NWPU-RESISC45) show that our method can avoid the high computation costs of the first method and achieve comparable results to the first method, which exhibits the effectiveness of our method.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call