Reflectance estimation for proximity sensing by vision-language models: utilizing distributional semantics for low-level cognition in robotics

Masashi Osada,Gustavo A Garcia Ricardez,Yosuke Suzuki,Tadahiro Taniguchi

doi:10.1080/01691864.2024.2393408

Masashi Osada, Gustavo A Garcia Ricardez + Show 2 more

Open Access

https://doi.org/10.1080/01691864.2024.2393408

Copy DOI

Export

Save

Cite

Journal: Advanced Robotics	Publication Date: Aug 31, 2024
Citations: 1	License type: CC BY 4.0

Abstract
Full-Text
Similar Papers

Abstract

Listen

Large language models (LLMs) and vision-language models (VLMs) have been increasingly used in robotics for high-level cognition, but their use for low-level cognition, such as interpreting sensor information, remains underexplored. In robotic grasping, estimating the reflectance of objects is crucial for successful grasping, as it significantly impacts the distance measured by proximity sensors. We investigate whether LLMs can estimate reflectance from object names alone, leveraging the embedded human knowledge in distributional semantics, and if the latent structure of language in VLMs positively affects image-based reflectance estimation. In this paper, we verify that (1) LLMs such as GPT-3.5 and GPT-4 can estimate an object's reflectance using only text as input; and (2) VLMs such as CLIP can increase their generalization capabilities in reflectance estimation from images. Our experiments show that GPT-4 can estimate an object's reflectance using only text input with a mean error of 14.7%, lower than the image-only ResNet. Moreover, CLIP achieved the lowest mean error of 11.8%, while GPT-3.5 obtained a competitive 19.9% compared to ResNet's 17.8%. These results suggest that the distributional semantics in LLMs and VLMs increases their generalization capabilities, and the knowledge acquired by VLMs benefits from the latent structure of language.

Full Text

Published Version

View

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

Reflectance estimation for proximity sensing by vision-language models: utilizing distributional semantics for low-level cognition in robotics

Abstract

Published Version

Talk to us

Similar Papers

More From: Advanced Robotics

Lead the way for us

Similar Papers

How Can IJDS Authors, Reviewers, and Editors Use (and Misuse) Generative AI?
Galit Shmueli ... Bianca Maria Colosimo
INFORMS Journal on Data Science | VOL. 2
Galit Shmueli, et. al.Galit Shmueli ... Bianca Maria Colosimo
01 Apr 2023
INFORMS Journal on Data Science | VOL. 2

Comparing Large Language Model and Human Reader Accuracy with New England Journal of Medicine Image Challenge Case Image Inputs.
Pae Sun Suh ... Hyungjun Park
Radiology | VOL. 313
Pae Sun Suh, et. al.Pae Sun Suh ... Hyungjun Park
01 Dec 2024
Radiology | VOL. 313

A Large and Diverse Arabic Corpus for Language Modeling
Abbas Raza Ali ... Hasan Raza Ali
Procedia Computer Science | VOL. 225
Abbas Raza Ali, et. al.Abbas Raza Ali ... Hasan Raza Ali
01 Jan 2023
Procedia Computer Science | VOL. 225

Performance of Large Language Models on a Neurology Board–Style Examination
Marc Cicero Schubert ... Varun Venkataramani
JAMA network open | VOL. 6
Marc Cicero Schubert, et. al.Marc Cicero Schubert ... Varun Venkataramani
07 Dec 2023
JAMA network open | VOL. 6

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

Reflectance estimation for proximity sensing by vision-language models: utilizing distributional semantics for low-level cognition in robotics

Abstract

Published Version

Talk to us

Similar Papers

More From: Advanced Robotics