Abstract

High-level structure (HLS) extraction recovers 3D elements on human-made surfaces (objects, buildings, ground, etc.). There are several approaches to HLS extraction. However, most of these approaches are based on processing two or more images captured from different camera views or on processing 3D data in the form of point clouds extracted from the camera images. In general, 3D point cloud and multiple views approaches have good performance for certain scenes with video sequences or image sequences, but they need sufficient parallax in order to guarantee accuracy. To address this problem, an alternative is to process a single RGB image seeking to interpret areas of the images where the human-made structure may be observed, thus removing parallax dependency, but adding the challenge of having to interpret image ambiguities correctly. Motivated by the latter, we propose a methodology for 3D volumetric structure extraction from a single image. Our strategy is to divide and simplify the 3D structure extraction process. For that, our methodology has three steps. First, the structure recognition step provides the segmentation, location, and delimitation of the urbanized structures in the scene. Second, we propose a graph analysis to classify and locate the boundaries between the different urbanized structures in the scene. Third, we use a proposed CNN and the pinhole camera model to extract the 3D volumetric structure. On the other hand, we evaluate this methodology in synthetic and public datasets.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call