Abstract Humans acquire their native languages by taking part in communicative interactions with their caregivers. These interactions are meaningful, intentional, and situated in their everyday environment. The situated and communicative nature of the interactions is essential to the language acquisition process, as language learners depend on clues provided by the communicative environment to make sense of the utterances they perceive. As such, the linguistic knowledge they build up is rooted in linguistic forms, their meaning, and their communicative function. When it comes to machines, the situated, communicative, and interactional aspects of language learning are often passed over. This applies in particular to today’s large language models (LLMs), where the input is predominantly text-based, and where the distribution of character groups or words serves as a basis for modeling the meaning of linguistic expressions. In this article, we argue that this design choice lies at the root of a number of important limitations, in particular regarding the data hungriness of the models, their limited ability to perform human-like logical and pragmatic reasoning, and their susceptibility to biases. At the same time, we make a case for an alternative approach that models how artificial agents can acquire linguistic structures by participating in situated communicative interactions. Through a selection of experiments, we show how the linguistic knowledge that is captured in the resulting models is of a fundamentally different nature than the knowledge captured by LLMs and argue that this change of perspective provides a promising path towards more human-like language processing in machines.
Read full abstract