Abstract

In this paper, a novel convolutional neural network (CNN)-based architecture, named fine segmentation network (FSN), is proposed for semantic segmentation of high resolution aerial images and light detection and ranging (LiDAR) data. The proposed architecture follows the encoder–decoder paradigm and the multi-sensor fusion is accomplished in the feature-level using multi-layer perceptron (MLP). The encoder consists of two parts: the main encoder based on the convolutional layers of Vgg-16 network for color-infrared images and a lightweight branch for LiDAR data. In the decoder stage, to adaptively upscale the coarse outputs from encoder, the Sub-Pixel convolution layers replace the transposed convolutional layers or other common up-sampling layers. Based on this design, the features from different stages and sensors are integrated for a MLP-based high-level learning. In the training phase, transfer learning is employed to infer the features learned from generic dataset to remote sensing data. The proposed FSN is evaluated by using the International Society for Photogrammetry and Remote Sensing (ISPRS) Potsdam and Vaihingen 2D Semantic Labeling datasets. Experimental results demonstrate that the proposed framework can bring considerable improvement to other related networks.

Highlights

  • Semantic segmentation of high resolution remote sensing images aims at assigning each pixel a certain semantic class, for instance, building, car, tree, or low vegetation

  • We evaluate fully convolutional network (FCN)-8s, SegNet, fine segmentation network (FSN)-noL, HSN, FSN, and FSN with post-process (FSN + conditional random fields (CRFs)) by using Potsdam test set

  • We can observe that FSN-noL achieved lower errors compared with FCN-8s and SegNet: it made fewer mistakes in class impervious surface without light detection and ranging (LiDAR) data

Read more

Summary

Introduction

Semantic segmentation of high resolution remote sensing images aims at assigning each pixel a certain semantic class, for instance, building, car, tree, or low vegetation. New models based on object-oriented classification [7] and sparse representation [8] have been developed too. These frameworks might obtain satisfactory classification performance, they typically suffer of major drawbacks which can jeopardize the processing outcomes. When scenarios characterized by high spectral complexity are taken into account, these architectures are not able to accurately track the interplay among samples on global and local scale. These shallow learning networks cannot satisfy the requirements for the complexity and diversity of functions and training samples because they usually have only one hidden layer

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call