ABSTRACT Despite the exceptional performance of large language models (LLMs) on a wide range of tasks involving natural language processing and reasoning, there has been sharp disagreement as to whether their abilities extend to more creative human abilities. A core example is the interpretation of novel metaphors. Here we assessed the ability of GPT-4, a state-of-the-art large language model, to provide natural-language interpretations of a recent AI benchmark (Fig-QA dataset), novel literary metaphors drawn from Serbian poetry and translated into English, and entire novel English poems. GPT-4 outperformed previous AI models on the Fig-QA dataset. For metaphors drawn from Serbian poetry, human judges – blind to the fact that an AI model was involved – rated metaphor interpretations generated by GPT-4 as superior to those provided by a group of college students. In interpreting reversed metaphors, GPT-4, as well as humans, exhibited signs of sensitivity to the Gricean cooperative principle. In addition, for several novel English poems GPT-4 produced interpretations that were rated as excellent or good by a human literary critic. These results indicate that LLMs such as GPT-4 have acquired an emergent ability to interpret literary metaphors, including those embedded in novel poems.
Read full abstract