Abstract

The supervised model based on deep learning has made great achievements in the field of image classification after training with a large number of labeled samples. However, there are many categories without or only with a few labeled training samples in practice, and some categories even have no training samples at all. The proposed zero-shot learning greatly reduces the dependence on labeled training samples for image classification models. Nevertheless, there are limitations in learning the similarity of visual features and semantic features with a predefined fixed metric (e.g., as Euclidean distance), as well as the problem of semantic gap in the mapping process. To address these problems, a new zero-shot image classification method based on an end-to-end learnable deep metric is proposed in this paper. First, the common space embedding is adopted to map the visual features and semantic features into a common space. Second, an end-to-end learnable deep metric, that is, the relation network is utilized to learn the similarity of visual features and semantic features. Finally, the invisible images are classified, according to the similarity score. Extensive experiments are carried out on four datasets and the results indicate the effectiveness of the proposed method.

Highlights

  • Thanks to the development of deep learning models, image classification and image recognition have made continuous progress

  • Inspired by the relation network model, we propose a new Zero-shot learning (ZSL) method based on the learnable deep metric in this paper

  • The zero-shot image classification method based on learnable deep metric

Read more

Summary

Introduction

Thanks to the development of deep learning models, image classification and image recognition have made continuous progress. Sandouk et al [13] have used the Euclidean distance between embedded concepts in the concept embedding space to reflect the semantic similarity; while the simple metric has the limitation of unlearnable and being predefined in advance To overcome these limitations, Sung et al [14] have proposed the relation network model (RN) to learn a learnable end-to-end deep metric for comparing the relation between visual features and semantic features with the relationship scores. ZIC-LDM, can learn the correlation between visual features and semantic features in the common space with the learnable deep metric, and it adjusts the correlation end-to-end in a data-driven way This can greatly alleviate the semantic gap problem caused by the inconsistency between the manifold of visual features and semantic features. Experiments are conducted on widely used datasets and the experimental results indicate that ZIC-LDM has the ability to achieve better zero-shot image classification performance compared with other methods

Zero-Shot Learning
Meta Learning
Semantic Features
Similarity Measure for Zero-Shot Image Classificaiton
Task Define n
Relation Module
Common Space Embedding Module
Objective Function
Model Implementation
Zero-Shot Image Classification
Generalized Zero-Shot Image Classification
Dataset and Settings
Traditional Zero-Shot Image Classification
Generalized
Loss Convergence Analysis
Distance Metric Study
Findings
Conclusions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call