Methodology for using a Bayesian nonparametric model to uncover universal patterns in color naming

Kirbi Joe,Maryam Gooyabadi

doi:10.1016/j.mex.2021.101572

Abstract

Language is an integral part of society which enables communication among its members. To shed light on how words gain their meaning and how their meaning evolves over time, color naming is often used as a case study. The color domain can be defined by a physical space, making it a useful concept for studying denotation of meaning. Though humans can distinguish millions of colors, language provides us with a small, manageable set of terms for categorizing the space. Partitions of the color space vary across different language groups and evolve over time (e.g. new color terms may enter a language). Investigating universal patterns in color naming provides insight into the mechanisms that give rise to the observed data. Recently, computational techniques have been utilized to study this phenomenon. Here, we develop a methodology for transforming a color naming data set—namely, the World Color Survey—which is based on constraints imposed by the stimulus space. This transformed data is used to initialize a nonparametric Bayesian machine learning model in order to implement a culture and theory-independent study of universal color naming patterns across different language groups. All of the methods described are executed by our Python software package called ColorBBDP.• Data from the World Color Survey is transformed from its original format into binary features vectors which can be given as input to the Beta-Bernoulli Dirichlet Process Mixture Model.• This paper provides a specific application of Variational Inference on the Beta-Bernoulli Dirichlet Process Mixture Model towards a color naming data set.• New mathematical measures for performing post-cluster analyses are also detailed in this paper.

Full Text