Closing the Performance Gap between Siamese Networks for Dissimilarity Image Classification and Convolutional Neural Networks.

Loris Nanni,Davide Sarraggiotto,Giovanni Minchio,Sheryl Brahnam,Alessandra Lumini

doi:10.3390/s21175809

Abstract

In this paper, we examine two strategies for boosting the performance of ensembles of Siamese networks (SNNs) for image classification using two loss functions (Triplet and Binary Cross Entropy) and two methods for building the dissimilarity spaces (FULLY and DEEPER). With FULLY, the distance between a pattern and a prototype is calculated by comparing two images using the fully connected layer of the Siamese network. With DEEPER, each pattern is described using a deeper layer combined with dimensionality reduction. The basic design of the SNNs takes advantage of supervised k-means clustering for building the dissimilarity spaces that train a set of support vector machines, which are then combined by sum rule for a final decision. The robustness and versatility of this approach are demonstrated on several cross-domain image data sets, including a portrait data set, two bioimage and two animal vocalization data sets. Results show that the strategies employed in this work to increase the performance of dissimilarity image classification using SNN are closing the gap with standalone CNNs. Moreover, when our best system is combined with an ensemble of CNNs, the resulting performance is superior to an ensemble of CNNs, demonstrating that our new strategy is extracting additional information.

Highlights

Interest in classification systems based onsimilarity spaces is resurging
In this paper, we examine two strategies for boosting the performance of ensembles of Siamese networks (SNNs) for image classification using two loss functions (Triplet and Binary Cross Entropy) and two methods for building the dissimilarity spaces (FULLY and DEEPER)
The basic design of the Siamese Neural Network (SNN) takes advantage of supervised k-means clustering for building the dissimilarity spaces that train a set of support vector machines, which are combined by sum rule for a final decision

Summary

Introduction

Interest in classification systems based on (dis)similarity spaces is resurging. Unlike the more common technique of classifying samples within a feature space, (dis)similarity classification estimates the class of an unknown pattern by examining its similarities and dissimilarities with a set of training samples and pairwise (dis)similarities between each of the members. An improved version of this method was developed for generic image classification in [14], where dissimilarity spaces were produced by a set of clustering methods and a set of SNNs with different CNN backbones. This approach was shown to compete well against state-of-the-art classifiers on several image data sets and obtained the highest classification score on one of them. This work further expands [14] by proposing additional techniques for improving the performance of an ensemble of SNNs. As in the earlier work, each Siamese network, composed of eight different CNN topologies, generates a dissimilarity space whose features train an SVM, and the SVMs are combined by sum rule. The versatility and robustness of the best ensemble developed using these techniques are demonstrated on five cross-domain image data sets representing medical imaging problems, animal vocalizations (spectrograms), and portrait images

Proposed Approach

Loss Functions

Adam Variants

Experimental Results

Method

Conclusions