Abstract

The multi-label image recognition task is widely prevalent in real-world scenarios. Overcoming the issue of overlapping and densely packed objects in complex scenes is crucial. For instance, in traffic scenarios, there are overlaps and close proximity among pedestrians, various types of vehicles, and signage. However, a primary obstacle in leveraging label relationships to enhance image classification lies in effectively integrating label semantic topology information with the image data itself. In this paper, we propose a novel framework, the Bipartite-driven Superimposed Dynamic Graph Convolutional Network (Bi-SDNet), augmented with Mapping Alignment Module (MAM) and Semantic Decoupling Module(SDM). Our approach initially decomposes input features into representations capable of discerning category label semantics at multiple scales, facilitated by MAM and SDM modules. Furthermore, through the meticulously designed Superimposed Dynamic Graph, we adeptly capture content-aware category relationships for each image, effectively modeling the relationships between these representations for the final recognition task. We conducted extensive experiments on publicly available benchmark datasets and the traffic scene dataset WZ-traffic. The model achieved an impressive 87.5% mean average precision (mAP) on the MS-COCO dataset and a commendable 91% mAP on the WZ-traffic dataset. Our research introduces novel techniques and significant breakthroughs in this field, furnishing powerful tools for enhancing model performance.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call