Abstract

The semantic segmentation of aerial images enables many useful applications such as tracking city growth, tracking deforestation, or automatically creating and updating maps. However, gathering enough training data to train a proper model for the automated analysis of aerial images is usually too labor-intensive and thus too expensive in most cases. Therefore, domain adaptation techniques are often necessary to be able to adapt existing models or to transfer knowledge from existing datasets to new unlabeled aerial images. Modern adaptation approaches make use of complex architectures involving many model components, losses and loss weights. These approaches are hard to apply in practice since their hyperparameters are hard to optimize for a given adaptation problem. This complexity is the result of trying to separate domain-invariant elements, e.g., structures and shapes, from domain-specific elements, e.g., textures. In this paper, we present a novel model for semantic segmentation, which not only achieves state-of-the-art performance on aerial images, but also inherently learns separate feature representations for shapes and textures. Our goal is to provide a model which can serve as the basis for future domain adaptation approaches which are simpler but still effective. Through end-to-end training our deep learning model learns to map aerial images to feature representations which can be decoded into binary space partitioning trees, a resolution-independent representation of the semantic segmentation, which can then be rendered into a pixelwise semantic segmentation in a differentiable way.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call