Abstract

Three-dimensional (3D) building structures are vital to understanding urban dynamics. Monocular remote sensing imagery is a cost-effective data source for large-scale building height retrieval when compared to LiDAR data and multi-view imagery. Existing methods learn building footprints and height maps per pixel via either a multi-task network or two separate networks, however, failing to consider the information of neighboring pixels that belong to the identical building. Therefore, we propose learning a novel representation for 3D buildings, namely 3D centripetal shifts, a unified representation of individual building instances. Our method is termed as 3DCentripetalNet and learns the 3D centripetal shift representation that incorporates planar and vertical structures of buildings. Afterward, a decoupling module is devised to learn building corner points. Finally, a 3D modeling module is designed to retrieve building height from the learned 3D centripetal shift map and corner points. We investigate the proposed 3DCentripetalNet on two datasets with different spatial resolutions, i.e., the ISPRS Vaihingen dataset (9 cm/pixel) and the Urban 3D dataset (50 cm/pixel). Experimental results suggest that 3DCentripetalNet is able to preserve sharp building boundaries, largely alleviate false detections, and significantly outperform other competitors. Thus, we believe that 3DCentripetalNet is a robust solution for the task of building height retrieval from monocular imagery.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call