Abstract

Abstract. In the last decade, we have observed an increasing demand for indoor scene modeling in various applications, such as mobility inside buildings, emergency and rescue operations, and maintenance. Automatically distinguishing between structural elements of buildings, such as walls, ceilings, floors, windows, doors etc., and typical objects in buildings, such as chairs, tables and shelves, is particularly important for many reasons, such as 3D building modeling or navigation. This information can be generally retrieved through semantic labeling. In the past few years, convolutional neural networks (CNN) have become the preferred method for semantic labeling. Furthermore, there is ongoing research on fusing RGB and depth images in CNN frameworks. For pixel-level labeling, encoder-decoder CNN frameworks have been shown to be the most effective. In this study, we adopt an encoder-decoder CNN architecture to label structural elements in buildings and investigate the influence of using depth information on the detection of typical objects in buildings. For this purpose, we have introduced an approach to combine depth map with RGB images by changing the color space of the original image to HSV and then substitute the V channel with the depth information (D) and use it utilize it in the CNN architecture. As further variation of this approach, we also transform back the HSD images to RGB color space and use them within the CNN. This approach allows for using a CNN, designed for three-channel image input, and directly comparing our results with RGB-based labeling within the same network. We perform our tests using the Stanford 2D-3D-Semantics Dataset (2D-3D-S), a widely used indoor dataset. Furthermore, we compare our approach with results when using four-channel input created by stacking RGB and depth (RGBD). Our investigation shows that fusing RGB and depth improves results on semantic labeling; particularly, on structural elements of buildings. On the 2D- 3D-S dataset, we achieve up to 92.1 % global accuracy, compared to 90.9 % using RGB only and 93.6 % using RGBD. Moreover, the scores of Intersection over Union metric have improved using depth, which shows that it gives better labeling results at the boundaries.

Highlights

  • Urbanization and population growth in the cities has increased interest in generating detailed hierarchical models of urban areas, including objects, such as buildings, roads, ponds, etc., described by their geometry and semantics

  • We use for evaluation Intersection over Union (IoU), which is the average value of the intersection of the prediction and ground truth over the union of them

  • We investigated the influence of using fused RGB with depth information to label indoor scenes within an encoderdecoder convolutional neural networks (CNN) framework and compared it with the performance of an RGB-based labeling

Read more

Summary

Introduction

Urbanization and population growth in the cities has increased interest in generating detailed hierarchical models of urban areas, including objects, such as buildings, roads, ponds, etc., described by their geometry and semantics Among these objects, buildings have an important role as people perform most of their activities indoors. Buildings have an important role as people perform most of their activities indoors This requires systems that provide improved support for applications including but not limited to mobility inside buildings, emergency and rescue operations and maintenance tasks. Automatically distinguishing between structural elements of buildings, such as walls, ceilings, floors, win-

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call