Mining Semantic Information With Dual Relation Graph Network for Multi-Label Image Classification

Wei Zhou,Tao Su,Haifeng Hu,Weitao Jiang,Dihu Chen

doi:10.1109/tmm.2023.3277279

Abstract

The purpose of multi-label image classification is to assign multiple labels for multiple objects presented in one image. Recent research efforts exploit graph convolution network (GCN) to learn the label co-occurrence dependencies for enhancing the semantic representation. Although these methods have achieved promising results, they can not capture the intrinsic correlation between objects in images and do not consider the inter-channel relationship. In addition, the previous methods treat each single image independently and fail to explore the relationship between different images. To address the above challenges, we propose a novel Dual Relation Graph Network (DRGN) model, which adopts a double branch structure to excavate rich semantic information from intra-image and cross-image simultaneously. Specifically, we first develop an intra-image channel-relation mining (ICM) module to mine the inter-channel relationship in features while learning the importance of different channels. Secondly, we design a new GCN-based intra-image spatial-relation exploring (ISE) module to capture the correlation between objects in individual image. Notably, ISE module and ICM module can complement and promote each other from the spatial and channel dimensions of images to improve the correlation between objects in individual image. Thirdly, we propose a novel GCN-based cross-image semantic learning (CSL) module to learn the semantic relationship between different images in the mini-batch. Through graph reasoning, our CSL module can iteratively refine input image features by acquiring common semantic information from other images in the mini-batch. Extensive experiments on the MS-COCO 2014, PASCAL VOC 2007, and VG-500 datasets demonstrate that the proposed DRGN model outperforms current state-of-the-art methods.

Full Text