Weakly supervised semantic segmentation, based on image-level labels, abandons the pixel-level labels relied upon by traditional semantic segmentation algorithms. It only utilizes images as supervision information, thereby reducing the time cost and human resources required for marking pixel data. The prevailing approach in weakly supervised segmentation involves two-step method, introducing an additional network and numerous parameters, thereby complicating the model structure. Furthermore, image-level labels typically furnishes only category information for the entire image, lacking specific location details and accurate target boundaries during model training. We propose an innovative One-Step Triple Enhanced weakly supervised semantic segmentation network(OSTE). OSTE streamlines the model structure, which can accomplish both pseudo-labels generation and semantic segmentation tasks in just one step. Furthermore, we augment the weakly supervised semantic segmentation network in three key aspects based on the class activation map construction method, thereby enhancing segmentation accuracy: Firstly, by integrating local information from the activation map with the image, we can enhance the network's localization and expansion capabilities to obtain more accurate and rich location information. Then, we refine the seed areas of the class activation map by exploiting the correlation between multi-level feature. Finally, we incorporate conditional random field theory to generate pseudo-labels with higher confidence and richer boundary information. In comparison to the prevailing two-step weakly supervised semantic segmentation schemes, the segmentation network proposed in this paper achieves a more competitive mean Intersection over Union (mIoU) score of 58.47% on Pascal VOC. Additionally, it enhances the mIoU score by at least 5.03% when compared to existing end-to-end schemes.
Read full abstract