Zipf's Law in Human-Machine Dialog

Guido M. Linders,Max. M. Louwerse

doi:10.1145/3383652.3423878

Abstract

Zipf's law is a mathematically relatively simple formula stating that the frequency of a word is inversely correlated with its rank. Zipf's law is well-known in computational linguistics and cognitive sciences alike. In the context of agent development, however, Zipf's law has hardly ever been mentioned. This is surprising as principles regarding language likely benefit the development of conversational agents. This paper serves as a starting point to explore the role of Zipf's law in agent development, showing that Zipf's law also applies to dialog. Moreover, it can shed light on human-machine dialog. In addition to word frequency distributions that demonstrate Zipf's law, we also included frequency distributions of words at specific positions in the sentence as well as turn lengths. Zipf's law was found in the far majority of analyses we conducted. In addition, we investigated whether Zipf's law can be used to detect differences between human and agent-generated speech through correlating the distributions and found that even though both the human and agent frequency distributions follow Zipf's law, these distributions are not necessarily similar, shedding light on where agent dialog may distinguish itself from human dialog. The findings in this paper can thus serve as a way to monitor to what extent ubiquitous patterns in human-human dialog are found in human-machine dialog.

Full Text