Weakly supervised instance segmentation based on image-level class labels has recently gained much attention, in which the primary key step is to generate the pseudo labels based on class activation maps (CAMs). Most methods adopt binary cross-entropy (BCE) loss to train the classification model. However, since BCE loss is not class mutually exclusive, activations among classes occur independently. Thus, not only do foreground classes are wrongly activated as background, but also incorrect activations among confusing classes are occurred in the foreground. To solve this problem, we propose the Class Double-Activation Map, called Double-CAM. Firstly, the vanilla CAM is extracted from the multi-label classifier and then fused with the output feature map of backbone. The enhanced feature map of each class is fed into the single-label classification branch with softmax cross-entropy (SCE) loss and entropy minimization module, from which the more accurate Double-CAM is extracted. It refines the vanilla CAM to improve the quality of pseudo labels. Secondly, to mine object edge cues from Double-CAM, we propose the Boundary Localization (BL) module to synthesize boundary annotations, so as to provide constraints for label propagation more explicitly without adding additional supervision. The quality of pseudo masks is also improved substantially with the addition of BL module. Finally, the generated pseudo labels are used to train fully supervised instance segmentation networks. The evaluations on VOC and COCO datasets show that our method achieves excellent performance, outperforming mainstream weakly supervised segmentation methods at the same supervisory level, even those that depend on stronger supervision.
Read full abstract