Abstract

We present PolyBuilding, a polygon Transformer for building extraction. PolyBuilding direct predicts vector representation of buildings from remote sensing images. It builds upon an encoder–decoder transformer architecture and simultaneously predicts the bounding boxes and polygons for the building instances. Given a set of polygon queries, the model learns the relations among them and encodes context information from the image to predict the final set of building polygons with a fixed vertex number. Considering that predicting a fixed number of vertices would cause vertex redundancy and reduce polygon regularity, we design a corner classification head to distinguish the building corners. By taking advantage of the corner classification scores, a polygon refinement scheme is designed to remove the redundant vertices and produce the final polygons with regular contours and low complexity. In addition, although the PolyBuilding model is fully end-to-end trainable, we propose a two-phase training strategy to decompose the coordinate regression and corner classification into two stages to alleviate the difficulty of multi-task learning. Comprehensive experiments are conducted on the CrowdAI dataset and Inria dataset. PolyBuilding achieves a new state-of-the-art in terms of pixel-level coverage, instance-level detection performance, and geometry-level properties. Quantitative and qualitative results verify the superiority and effectiveness of our model.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call