Abstract
Acoustic correlates of speech can present empirical distributions that create issues for the linear models we often use to analyze them. In some such cases, we have reason to believe that there are multiple distributions responsible for producing the data. One example is the distribution of voice onset time (VOT) in the phonologically voiced stop series /b,d,g/ in North American English. While more often produced as a short lag stop with a VOT near zero, it can also be produced with prevoicing. This leads to a bimodal distribution of VOT values for these segments. The first study of VOT by Lisker and Abramson (1964) wisely decided to report descriptive statistics for each of these pronunciation variants separately, recognizing that listing a single mean and range would have been a poor characterization of the data and potentially misleading to readers. This tutorial introduces Bayesian finite mixture models, which can model a dependent variable using a combination of two or more distributions. This is useful in situations like the one described above for English VOT, as having two distributions allows us to model both short-lag and prevoiced VOT values together in the same model. The tutorial begins with a brief conceptual explanation of these models, followed by a walkthrough of an analysis of simulated English VOT data using the package “brms” (Bürkner, 2017). An emphasis is placed on setting priors, especially setting informative priors for the intercepts of the two distributions.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have