Mixer-Based Semantic Spread for Few-Shot Learning

Jun Cheng,Fusheng Hao,Liu Liu,Fengxiang He,Qieshi Zhang

doi:10.1109/tmm.2021.3123813

Abstract

Key semantics can come from everywhere on an image. Semantic alignment is a key part of few-shot learning but still remains challenging. In this paper, we design a Mixer-Based Semantic Spread (MBSS) algorithm that employs a <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">mixer</i> module to spread the key semantic on the whole image, so that one can directly compare the processed image pairs. We first adopt a convolutional neural network to extract features from both support and query images and separate each of them into multiple Local Descriptor-based Representations (LDRs). The LDRs are then fed into the <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">mixer</i> for semantic spread, where every LDR attracts complementary information from its peers. In this way, the objective semantic is made spread on the whole image in a data-driven manner. The overall pipeline is supervised by a voting-based loss, guaranteeing a good <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">mixer</i> . Visualization results validate the feasibility of our <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">mixer</i> . Comprehensive experiments on three benchmark datasets, miniImageNet, tieredImageNet, and CUB, show that our algorithm achieves the state-of-the-art performance in both 5-way 1-shot and 5-way 5-shot settings.

Full Text