Abstract

AbstractThis article presents the first technique to estimate a 3D terrain model from a single landscape image. Although monocular depth estimation also offers single‐image 3D reconstruction, it assigns depth only to pixels visible in the input image, resulting in an incomplete 3D terrain output. Our method generates a complete 3D terrain model as a textured height map via a three‐stage framework using deep neural networks. First, to exploit the performance of pixel‐aligned estimation, we estimate terrain's per‐pixel depth and color free from shadows or lights in the perspective view. Second, we triangulate the RGB‐D data generated in the first stage and rasterize the triangular mesh from the top view to obtain an incomplete textured height map. Finally, we inpaint the depth and color in the missing regions. Because there are many possible ways to complete the missing regions, we synthesize diverse shapes and textures during inpainting using a variational autoencoder. Qualitative and quantitative experiments reveal that our method outperforms existing methods applying a direct perspective‐to‐top view transform as image‐to‐image translation.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call