Abstract

The application of convolutional neural networks has been shown to significantly improve the accuracy of building extraction from very high-resolution (VHR) remote sensing images. However, there exist so-called semantic gaps among different kinds of buildings due to the large intraclass variance of buildings, and most of the present-day methods are ineffective in extracting various buildings in large areas that cover different scenes, for example, urban villages and high-rise buildings, because existing building extraction strategies are the same for various scenes. With the improvement of the resolution of remote sensing images, it is feasible to improve the image interpretation based on the scene prior. However, this idea has not been fully utilized in building extraction from VHR remote sensing imagery. This study proposes a scene-driven multitask parallel attention convolutional network (MTPA-Net) to resolve these limitations. The proposed approach classifies the input image into multilabel scenes and further separately maps the buildings in pixel level under different scenes. In addition, a simple postprocessing method is applied to integrate the building extraction results and scene prior. Our proposed method does not require multimodel training and the network can learn in an end-to-end manner. The performance of our proposed method is evaluated on a data set that includes various urban and rural scenes with diverse landscapes. The experimental results show that the proposed MTPA-Net outperforms state-of-the-art algorithms by reducing misclassification areas and maintaining improved robustness.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call