Abstract

There has been much work over the last century on optimization of the lexicon for efficient communication, with a particular focus on the form of words as an evolving balance between production ease and communicative accuracy. Zipf’s law of abbreviation, the cross-linguistic trend for less-probable words to be longer, represents some of the strongest evidence the lexicon is shaped by a pressure for communicative efficiency. However, the various sounds that make up words do not all contribute the same amount of disambiguating information to a listener. Rather, the information a sound contributes depends in part on what specific lexical competitors exist in the lexicon. In addition, because the speech stream is perceived incrementally, early sounds in a word contribute on average more information than later sounds. Using a dataset of diverse languages, we demonstrate that, above and beyond containing more sounds, less-probable words contain sounds that convey more disambiguating information overall. We show further that this pattern tends to be strongest at word-beginnings, where sounds can contribute the most information.

Highlights

  • Human languages are characterized by hierarchically organized, nested structure: utterances are composed of structured sequences of words, and words in turn are composed of structured sequences of sounds

  • If we find that less-probable words have higher mean segment information when controlling for length, it suggests these words, in addition to having more segments, have more disambiguating information packed into those segments

  • The negative correlation that we find between mean segment information and word probability can only arise if the information in segments of less-probable words are concentrated in fewer segments

Read more

Summary

Introduction

Human languages are characterized by hierarchically organized, nested structure: utterances are composed of structured sequences of words, and words in turn are composed of structured sequences of sounds. Information for Low-Probability Words King, Wedel properties of interest—such as word length—can be straightforwardly measured, the lexicon has been a focus for much prior research on the role of biases toward efficient communication in shaping language patterns. Many of these studies conclude that patterns in the lexicon support the hypothesis that communicative efficiency is a driving pressure in the evolution of word to form mappings (Ferrer i Cancho & Solé, 2003; Kanwal, Smith, Culbertson, & Kirby, 2017; Piantadosi, Tily, & Gibson, 2009, 2012; Zipf, 1949)

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call