Abstract

The Class Activation Map (CAM) is widely used to generate pseudo-labels for Weakly Supervised Semantic Segmentation (WSSS), while it does not adequately consider the modeling of foreground-independent information, resulting in prone to false positive pixels. In this paper, we propose a Wave-like Class Activation Map (WaveCAM) from the perspective of representation fusion and dynamic aggregation representation to alleviate the above problem. Specifically, our WaveCAM includes the foreground-aware representation modeling that enhances perception of foreground information, and the foreground-independent representation modeling that enhances perception of foreground-independent information, and a representation-adaptive fusion module that fuses the two representations. Both representations are expressed as wave functions with amplitude and phase to dynamically aggregate representations and extract semantic information after initialization, and they are fused through the adaptive fusion module to obtain an output containing rich semantic information. Extensive experiments on PASCAL VOC 2012 dataset and MS COCO 2014 dataset validate that our WaveCAM can easily embed multi-stage WSSS and end-to-end WSSS, achieving the state-of-the-art performance. The release code is available at: <uri xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">https://github.com/Rongtao-Xu/RepresentationLearning/tree/main/WaveCAM-TMM2023</uri> .

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call