Complementary branch fusing class and semantic knowledge for robust weakly supervised semantic segmentation

Woojung Han,Seil Kang,Kyobin Choo,Seong Jae Hwang

doi:10.1016/j.patcog.2024.110922

Abstract

Leveraging semantically precise pseudo masks derived from image-level class knowledge for segmentation, namely image-level Weakly Supervised Semantic Segmentation (WSSS), remains challenging. Class Activation Maps (CAMs) using CNNs enhance WSSS by focusing on specific class parts like only the face of a human, whereas Vision Transformers (ViT) capture broader semantic parts but often miss complete class-specific details, such as human bodies with nearby objects like dogs. In this work, we propose a Complementary Branch (CoBra), a novel dual-branch framework consisting of two distinct architectures that provide valuable complementary knowledge of class (from CNN) and semantics (from ViT). In particular, we learn Class-Aware Projection (CAP) for the CNN branch and Semantic-Aware Projection (SAP) for the ViT branch, combining their insights to facilitate new patch-level supervision and create effective pseudo masks integrating class and semantic information. Extensive experiments qualitatively and quantitatively investigate how each branch complements the other, showing a significant result. Project Page and code are available: https://micv-yonsei.github.io/cobra2024/.

Full Text