Deep Features for Training Support Vector Machines.

Loris Nanni,Stefano Ghidoni,Sheryl Brahnam

doi:10.3390/jimaging7090177

Loris Nanni, Stefano Ghidoni + Show 1 more

Open Access

https://doi.org/10.3390/jimaging7090177

Copy DOI

Abstract

Features play a crucial role in computer vision. Initially designed to detect salient elements by means of handcrafted algorithms, features now are often learned using different layers in convolutional neural networks (CNNs). This paper develops a generic computer vision system based on features extracted from trained CNNs. Multiple learned features are combined into a single structure to work on different image classification tasks. The proposed system was derived by testing several approaches for extracting features from the inner layers of CNNs and using them as inputs to support vector machines that are then combined by sum rule. Several dimensionality reduction techniques were tested for reducing the high dimensionality of the inner layers so that they can work with SVMs. The empirically derived generic vision system based on applying a discrete cosine transform (DCT) separately to each channel is shown to significantly boost the performance of standard CNNs across a large and diverse collection of image data sets. In addition, an ensemble of different topologies taking the same DCT approach and combined with global mean thresholding pooling obtained state-of-the-art results on a benchmark image virus data set.

Highlights

We obtained better results tuning the convolutional neural networks (CNNs) on each training set without principal component analysis (PCA) processing and with application of the methods locally
Most of the results reported in the following tables for the dimensionality reduction methods are based on tuning the CNNs without PCA postprocessing and with the local application of methods
The best performance for TunLayer-x was obtained with x = 3; we report, for comparison purposes, the performance of Layer-3 on the CNN pretrained on ImageNet without tuning on the given data sets

Summary

Introduction

Extracting salient descriptors from images is the mainstay of many computer vision systems. These handcrafted descriptors are tailored to overcome specific problems in image classification with the goal being to achieve the best classification accuracy possible while maintaining computational efficiency. Some descriptors such as the scale invariant feature transform (SIFT) [1] are valued for their robustness, but they can be too computationally expensive for practical purposes. Variants of popular handcrafted descriptors such as some fast variants of SIFT [2] continue to be created in an attempt to overcome inherent shortcomings

Objectives

Methods

Results

Conclusion