Integrating topology beyond descriptions for zero-shot learning

Ziyi Chen,Yutong Gao,Congyan Lang,Lili Wei,Yidong Li,Hongzhe Liu,Fayao Liu

doi:10.1016/j.patcog.2023.109738

Abstract

Zero-shot learning (ZSL) aims to discriminate object categories through the identification of their attributes and has received much attention for its capability to predict unseen categories without collecting training data. Recently, excellent works have been devoted to optimizing the model inference by mining the topology among categories/attributes, which proves that the topology learning is beneficial and important for ZSL. However, existing works focus almost exclusively on the construction of semantic topological knowledge with textual descriptions, which, though effective, still suffer from two deficiencies: first, the semantic gap between modalities makes it difficult for the category attributes to accurately describe the corresponding visual characters, resulting in the topology constructed in the semantic modality being distorted in the visual modality; second, it is difficult for one to enumerate all the attributes hidden in images, resulting in an incomplete topology mined only from the defined attributes. Therefore, we propose a Cross-Modality Topology Propagation Matcher (CTPM) to construct a more complete topology system by collaborative mining of topological knowledge in both the visual and semantic modalities. We stand at the dataset level to construct sample-based visual topological knowledge based on the global image features to preserve the integrity of visual information. Meanwhile, we exploit the matching relationship between visual and semantic modalities to make topological knowledge propagate effectively across modalities, and fully enjoy the benefits of multi-modality topological knowledge in category/attribute reasoning. We validate the effectiveness of our CTPM through extensive experiments and achieve state-of-the-art performance on four ZSL datasets.

Full Text