Abstract

We studied the applicability of point clouds derived from tri-stereo satellite imagery for semantic segmentation for generalized sparse convolutional neural networks by the example of an Austrian study area. We examined, in particular, if the distorted geometric information, in addition to color, influences the performance of segmenting clutter, roads, buildings, trees, and vehicles. In this regard, we trained a fully convolutional neural network that uses generalized sparse convolution one time solely on 3D geometric information (i.e., 3D point cloud derived by dense image matching), and twice on 3D geometric as well as color information. In the first experiment, we did not use class weights, whereas in the second we did. We compared the results with a fully convolutional neural network that was trained on a 2D orthophoto, and a decision tree that was once trained on hand-crafted 3D geometric features, and once trained on hand-crafted 3D geometric as well as color features. The decision tree using hand-crafted features has been successfully applied to aerial laser scanning data in the literature. Hence, we compared our main interest of study, a representation learning technique, with another representation learning technique, and a non-representation learning technique. Our study area is located in Waldviertel, a region in Lower Austria. The territory is a hilly region covered mainly by forests, agriculture, and grasslands. Our classes of interest are heavily unbalanced. However, we did not use any data augmentation techniques to counter overfitting. For our study area, we reported that geometric and color information only improves the performance of the Generalized Sparse Convolutional Neural Network (GSCNN) on the dominant class, which leads to a higher overall performance in our case. We also found that training the network with median class weighting partially reverts the effects of adding color. The network also started to learn the classes with lower occurrences. The fully convolutional neural network that was trained on the 2D orthophoto generally outperforms the other two with a kappa score of over 90% and an average per class accuracy of 61%. However, the decision tree trained on colors and hand-crafted geometric features has a 2% higher accuracy for roads.

Highlights

  • Deep Learning (DL) has drastically improved the state-of-the-art across a variety of applications in Natural Language Processing (NLP) and Computer Vision (CV) [1,2,3,4] due to its ability to learn representations in an unsupervised manner [5,6,7]

  • We studied the applicability of point clouds derived from tri-stereo satellite imagery for semantic segmentation for generalized sparse convolutional neural networks by the example of an Austrian study area

  • The image processing chain consists of the following main steps: (1) import of the images, Rational Polynomial Coefficients (RPCs), and Ground Control Points (GCPs), followed by image pyramids generation; (2) identification of GCPs locations within the images; (3) RPCs refinement based on the GCPs and automatically extracted tie points; (4) dense image matching for 3D reconstruction; and (5) point cloud interpolation for Digital Surface Model (DSM) derivation

Read more

Summary

Introduction

Deep Learning (DL) has drastically improved the state-of-the-art across a variety of applications in Natural Language Processing (NLP) and Computer Vision (CV) [1,2,3,4] due to its ability to learn representations in an unsupervised manner [5,6,7]. In CV, DL has advanced, inter alia, the tasks of image classification [12,13], object-detection [14,15,16], object-tracking [17], pose estimation [18,19,20,21], superresolution [22], and semantic segmentation [23,24,25,26,27,28] These advancements give rise to new applications in, e.g., solid-state materials science and chemical sciences [29,30], meteorology [31], medicine [32,33,34,35,36,37,38,39], seismology [40,41,42], biology [43], life sciences in general [44], chemistry [45], and physics [46,47,48,49,50,51,52,53], as well as the fashion industry [54,55,56]. The output of dense image matching are 3D point clouds, which are further used as input for DSM derivation

Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.