Abstract

Current weakly supervised semantic segmentation methods usually generate noisy pseudo-labels. Training segmentation models with these labels tends to overfit the noise, leading to poor performance. Existing approaches often rely on iterative updates of pseudo-labels at pixel or image-level, ignoring the importance of region-level characteristics. The recently introduced Segment Anything Model (SAM) advances multiple approaches by fusing such region-level masks with noisy pseudo-labels. However, the fusion of noisy pseudo-labels using SAM is still challenging due to the lack of semantic information. To address these challenges, we propose a Region-based Online Selective Examination (ROSE). To be specific, we first consolidate SAM masks in a bottom-up manner to form a unified region prior. Then, leveraging these priors, region-level visual information is aggregated through the proposed region voting strategy. Furthermore, a cross-view selective examination method effectively explores semantic consistency between different image views and performs an examination to correct noisy pseudo-labels. The experimental results show that our ROSE achieves a new state-of-the-art on the Pascal VOC and COCO datasets. Moreover, the training time of our ROSE is over 10 times faster than previous methods.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call