Abstract

The structure of a multi-head ensemble has been employed by many algorithms in various applications including deep metric learning. However, their structures have been empirically designed in a simple way such as using the same head structure, which leads to a limited ensemble effect due to lack of head diversity. In this paper, for an elaborate design of the multi-head ensemble structure, we establish design concepts based on three structural factors: designing the feature layer for extracting the ensemble-favorable feature vector, designing the shared part for memory savings, and designing the diverse multi-heads for performance improvement. Through rigorous evaluation of variants on the basis of the design concepts, we propose a heterogeneous double-head ensemble structure that drastically increases ensemble gain along with memory savings. In verifying experiments on image retrieval datasets, the proposed ensemble structure outperforms the state-of-the-art algorithms by margins of over 5.3%, 6.1%, 5.9%, and 1.8% in CUB-200, Car-196, SOP, and Inshop, respectively.

Highlights

  • Deep metric learning has been successfully used in various applications related to computer vision, such as image retrieval [1]–[10], person re-identification [11], [12], and face verification [13]–[15]

  • Deep metric learning refers to the design of feature extracting functions with deep neural networks so that the features of semantically similar images are close to others

  • The proposed Heterogeneous Double-head Ensemble (HDhE) gives a performance improvement of over 4% of Recall@1 (CUB-200) from the single-head version of HDhE (Sh-HDhE)

Read more

Summary

Introduction

Deep metric learning has been successfully used in various applications related to computer vision, such as image retrieval [1]–[10], person re-identification [11], [12], and face verification [13]–[15]. Deep metric learning refers to the design of feature extracting functions with deep neural networks so that the features of semantically similar images are close to others. Ensemble is a method of ensuring robust performance by training diverse models and aggregating their prediction results. The ensemble requires two or more deep networks that extract diverse feature vectors [16]. A variety of feature vectors can be obtained by semantically diverse attention of the input image [11], [17] or by applying a loss to keep feature vectors away from others [2], [18]. The goal of the ensemble is to generate a synergy between those diverse

Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.