Abstract

Abstract Günther et al. (2022) investigated the relationship between words and images in which they concluded the possibility of a direct link between words and embodied experience. In their study, participants were presented with a target noun and a pair of images, one chosen by their model and another chosen randomly. Participants were asked to select the image that best matched the target noun. Building upon their work, we addressed the following questions. 1. Apart from utilizing visually embodied simulation, what other strategies subjects might have used? How much does this setup rely on visual information? Can it be solved using textual representations? 2. Do current visually-grounded embeddings explain subjects’ selection behavior better than textual embeddings? 3. Does visual grounding improve the representations of both concrete and abstract words? For this aim, we designed novel experiments based on pre-trained word embeddings. Our experiments reveal that subjects’ selection behavior is explained to a large extend on text-based embeddings and word-based similarities. Visually grounded embeddings offered modest advantages over textual embeddings in certain cases. These findings indicate that the experiment by Günther et al. (2022) may not be well suited for tapping into the perceptual experience of participants, and the extent to which it measures visually grounded knowledge is unclear.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call