Abstract
Patterns of word use both reflect and influence a myriad of human activities and interactions. Like other entities that are reproduced and evolve, words rise or decline depending upon a complex interplay between their intrinsic properties and the environments in which they function. Using Internet discussion communities as model systems, we define the concept of a word niche as the relationship between the word and the characteristic features of the environments in which it is used. We develop a method to quantify two important aspects of the size of the word niche: the range of individuals using the word and the range of topics it is used to discuss. Controlling for word frequency, we show that these aspects of the word niche are strong determinants of changes in word frequency. Previous studies have already indicated that word frequency itself is a correlate of word success at historical time scales. Our analysis of changes in word frequencies over time reveals that the relative sizes of word niches are far more important than word frequencies in the dynamics of the entire vocabulary at shorter time scales, as the language adapts to new concepts and social groupings. We also distinguish endogenous versus exogenous factors as additional contributors to the fates of words, and demonstrate the force of this distinction in the rise of novel words. Our results indicate that short-term nonstationarity in word statistics is strongly driven by individual proclivities, including inclinations to provide novel information and to project a distinctive social identity.
Highlights
Much information about the fabric of modern human society has been gleaned from large-scale records of human communications activities, such as time stamps and network structures for email exchanges, mobile phone calls, and Internet activity [1,2,3,4]
We have introduced two new quantities, dissemination across users (DU) and DT, as measures of the dissemination of words across individuals and topics, and used them to characterize the vocabulary of two online discussion groups over a period of more than a decade
We found that almost all words are concentrated with respect to both individuals and topics, and that at short-term time scales, the word’s concentration in the space of users and topics, as revealed by DU,T, is a strong determinant of word fate
Summary
Much information about the fabric of modern human society has been gleaned from large-scale records of human communications activities, such as time stamps and network structures for email exchanges, mobile phone calls, and Internet activity [1,2,3,4]. The flow of words has the potential to be even more informative. Words characterize both external events and otherwise unobservable mental states. They tap into the variety of experience, knowledge, and goals of different interacting individuals. The word stream is information-dense, because the number of distinct words and expressions is so great. The lexicon of a literate adult is estimated to contain over 100,000 distinct items [5], and it continues to grow as new words are encountered [6]
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.