Abstract

A critical obstacle to achieve semantic segmentation of remote sensing images by the deep convolutional neural network is the requirement of huge pixel-level labels. Taking building extraction as an example, this study focuses on how to effectively apply weakly supervised semantic segmentation (WSSS) to high-resolution remote sensing (HR) images with image-level labels, which is a prominent solution for the huge labeling challenge. The widely used two-step WSSS framework is adopted, in which the pseudo-masks are first produced from image-level labels and followed by a segmentation network trained by the pseudo-masks. In addition, the fully connected conditional random field (CRF) is utilized to explore spatial context in both training and prediction stages. Detailed analyzes are implemented on applying WSSS on HR images in terms of producing pseudo-masks, training segmentation network, and optimizing predictions. We show that the tradeoff between precision and recall of pseudo-masks, as well as the boundary accuracy and the background, needs to be carefully considered. The benefits of the segmentation network in the two-step framework are demonstrated in comparison to using classification network only for WSSS, and the effects of CRF-loss are identified to be powerful for improving the segmentation network while it is not appropriate for dense buildings. An overlapping strategy and CRF postprocessing are further demonstrated to be effective for optimizing the segmentation results during inferencing. Through deliberate settings, we can generate results comparable to fully supervised on the ISPRS Potsdam and Vaihingen dataset, which is meaningful for promoting WSSS applications for extracting geographic information from HR images.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call