Abstract

Building rooftop segmentation using deep learning techniques is a popular yet challenging area of research in computer vision and remote sensing image processing. While recent studies have developed various deep learning models, there has been less focus on investigating the basic elements of network architecture and foreground-background balance in sampled images. To address this research gap, this study proposes optimizing the UNet basic elements and image foreground-background balance to improve building rooftop segmentation accuracy. The Inria dataset is used for model training and accuracy evaluation. The results show that the impact of network backbone on segmentation accuracy depends on its ability to extract lower-level features and whether it is pretrained by large amounts of data. This study also finds that residual structures of network backbone can negatively affect performance regardless of the network depth. Among the decoding network upsampling strategies, the proposed super-resolution method achieves the best segmentation results, but does not significantly differ from the commonly used bilinear interpolation and deconvolution methods. Additionally, this study proposes a hybrid loss function that considers area, arithmetic, shape, spectral, and texture discrepancy, which further improves the segmentation performance. Finally, this study highlights the significant impact of image foreground-background balance on building rooftop segmentation and proposes an ensemble model to mitigate image foreground-background imbalance and improve segmentation performance.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call