Abstract

Leveraging semantically precise pseudo masks derived from image-level class knowledge for segmentation, namely image-level Weakly Supervised Semantic Segmentation (WSSS), remains challenging. Class Activation Maps (CAMs) using CNNs enhance WSSS by focusing on specific class parts like only the face of a human, whereas Vision Transformers (ViT) capture broader semantic parts but often miss complete class-specific details, such as human bodies with nearby objects like dogs. In this work, we propose a Complementary Branch (CoBra), a novel dual-branch framework consisting of two distinct architectures that provide valuable complementary knowledge of class (from CNN) and semantics (from ViT). In particular, we learn Class-Aware Projection (CAP) for the CNN branch and Semantic-Aware Projection (SAP) for the ViT branch, combining their insights to facilitate new patch-level supervision and create effective pseudo masks integrating class and semantic information. Extensive experiments qualitatively and quantitatively investigate how each branch complements the other, showing a significant result. Project Page and code are available: https://micv-yonsei.github.io/cobra2024/.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.