HaLo‐NeRF: Learning Geometry‐Guided Semantics for Exploring Unconstrained Photo Collections

Chen Dudai,Itai Lang,Morris Alper,Rana Hanocka,Hadar Averbuch‐Elor,Hana Bezalel

doi:10.1111/cgf.15006

Abstract

AbstractInternet image collections containing photos captured by crowds of photographers show promise for enabling digital exploration of large‐scale tourist landmarks. However, prior works focus primarily on geometric reconstruction and visualization, neglecting the key role of language in providing a semantic interface for navigation and fine‐grained understanding. In more constrained 3D domains, recent methods have leveraged modern vision‐and‐language models as a strong prior of 2D visual semantics. While these models display an excellent understanding of broad visual semantics, they struggle with unconstrained photo collections depicting such tourist landmarks, as they lack expert knowledge of the architectural domain and fail to exploit the geometric consistency of images capturing multiple views of such scenes. In this work, we present a localization system that connects neural representations of scenes depicting large‐scale landmarks with text describing a semantic region within the scene, by harnessing the power of SOTA vision‐and‐language models with adaptations for understanding landmark scene semantics. To bolster such models with fine‐grained knowledge, we leverage large‐scale Internet data containing images of similar landmarks along with weakly‐related textual information. Our approach is built upon the premise that images physically grounded in space can provide a powerful supervision signal for localizing new concepts, whose semantics may be unlocked from Internet textual metadata with large language models. We use correspondences between views of scenes to bootstrap spatial understanding of these semantics, providing guidance for 3D‐compatible segmentation that ultimately lifts to a volumetric scene representation. To evaluate our method, we present a new benchmark dataset containing large‐scale scenes with ground‐truth segmentations for multiple semantic concepts. Our results show that HaLo‐NeRF can accurately localize a variety of semantic concepts related to architectural landmarks, surpassing the results of other 3D models as well as strong 2D segmentation baselines. Our code and data are publicly available at https://tau‐vailab.github.io/HaLo‐NeRF/.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

HaLo‐NeRF: Learning Geometry‐Guided Semantics for Exploring Unconstrained Photo Collections

Abstract

Talk to us

Similar Papers

More From: Computer Graphics Forum

Lead the way for us

Journal: Computer Graphics Forum	Publication Date: Apr 15, 2024
License type: CC BY 4.0

Similar Papers

How Can IJDS Authors, Reviewers, and Editors Use (and Misuse) Generative AI?
Galit Shmueli ... Bianca Maria Colosimo
INFORMS Journal on Data Science | VOL. 2
Galit Shmueli, et. al.Galit Shmueli ... Bianca Maria Colosimo
01 Apr 2023
INFORMS Journal on Data Science | VOL. 2

Challenging large language models’ “intelligence” with human tools: A neuropsychological investigation in Italian language on prefrontal functioning
Riccardo Loconte ... Giuseppe Sartori
Heliyon | VOL. 10
Riccardo Loconte, et. al.Riccardo Loconte ... Giuseppe Sartori
01 Oct 2024
Heliyon | VOL. 10

Efficient Inference Offloading for Mixture-of-Experts Large Language Models in Internet of Medical Things
Xiaoming Yuan ... Zhenyu Luo
Electronics | VOL. 13
Xiaoming Yuan, et. al.Xiaoming Yuan ... Zhenyu Luo
27 May 2024
Electronics | VOL. 13

Integration of Large Language Models with IoT in Smart Agriculture to Improve Efficiency, Yield, and Quality
Tao Feng ... Hao Shen
Industry Science and Engineering | VOL. 1
Tao Feng, et. al.Tao Feng ... Hao Shen
01 Apr 2024
Industry Science and Engineering | VOL. 1

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

HaLo‐NeRF: Learning Geometry‐Guided Semantics for Exploring Unconstrained Photo Collections

Abstract

Talk to us

Similar Papers

More From: Computer Graphics Forum