UNIFORMITY IN PHONETIC REALIZATION: EVIDENCE FROM SIBILANT PLACE OF ARTICULATION IN AMERICAN ENGLISH ELEANOR CHODROFF COLIN WILSON University of York Johns Hopkins University Phonetic realization is highly variable and highly structured within and across talkers. We examine three constraints that could structure the phonetic space of related speech sounds: target, contrast, and pattern uniformity. Target uniformity requires a uniform mapping from distinctive features to their corresponding phonetic targets within a talker, contrast uniformity requires a consistent difference in the phonetic targets that realize featural contrasts across talkers, and pattern uniformity requires a uniform template of phonetic targets across talkers. Focusing on American English sibilant fricatives, we measure and compare each constraint’s influence on the phonetic targets corresponding to place of articulation. We find that target uniformity is the strongest constraint : each talker realizes a given distinctive feature value in highly similar ways across related sounds. Together with similar findings for other sound classes, this result reveals fine-grained systematicity in the mapping from phonology to phonetics and has implications for theories of speech production and speech perception.* Keywords: phonetic realization, sibilant fricatives, uniformity, talker variability, Bayesian models 1. Introduction. No one-to-one mapping exists between linguistic units and their phonetic instantiations (Liberman et al. 1967, Massaro 1975, Pisoni & Sawusch 1975). This lack of invariance is a fundamental issue for both the perception and production of language. From the perspective of perception, how do perceivers adapt to extensive variation in the physical signal (whether spoken or signed)? From the perspective of production, how do producers know the limits of acceptable variation for their particular language variety, or even just for intelligibility? It is well established that variation in phonetic realization is extensive yet structured in many ways (Labov 1972, Miller 1994, Foulkes et al. 2001, Kleinschmidt & Jaeger 2015, Guy & Hinskens 2016, Sonderegger et al. 2020). In the present article, we explore potential constraints on the mapping from phonological representations, such as segments and their distinctive features, to targets of phonetic realization. We begin by considering a subinventory of two or more related sounds (e.g. [i] and [u], or [s] and [z]) and their corresponding phonetic targets (i.e. perceptuomotor representations ). A given talker could structure the phonetic realization of these sounds by copying a pattern or template of targets that exists in the speech community and adapting it to their anatomy. This scenario allows for talker variation—one speaker’s realization of the template may be overall higher or lower on a given phonetic dimension—but otherwise it can be construed as ‘maximal phonetic structure’. Provided that the hypothetical template can be adapted to each speaker’s anatomy, this system would be fully general across the speaker population. Moreover, clear motivation for such a system exists in speech perception : if each talker has the same template of phonetic targets, perceptual adaptation would involve a simple translation of the pattern for each new talker. This is in fact assumed by many approaches to talker normalization and adaptation, especially for vowel systems (e.g. Lobanov 1971, Nearey 1978, Nearey & Assmann 2007). 1 * The authors would like to thank Shravan Vasishth for hosting the 2020 Potsdam Summer School on Statistical Methods in Linguistics and Psychology, Lisa Davidson for sharing the laboratory data, and Ryan Cotterell , Matthew Faytak, Josef Fruehwald, and Jane Stuart-Smith for helpful discussion. All data and analyses are available at https://osf.io/bysfa/. Printed with the permission of Eleanor Chodroff & Colin Wilson. © 2022. In opposition to maximal phonetic structure in the speech inventory, we can consider ‘maximal phonetic bricolage’. Bricolage reflects the constellation of linguistic variables that a talker can exploit for expressing social identity (Eckert 2008, Zimman 2017); taken to the extreme, it would allow talkers to pick and choose phonetic targets independently for each sound. In this scenario, the phonetic space may be structured by overarching social variables, but it would be entirely unstructured within the subinventory and across speakers. For example, the relationship among the phonetic targets of sibilant fricatives like [s], [z], [ʃ], and [ʒ] could be different for each speaker, depending on how each target is chosen to express some aspect of social identity. Existing evidence points to an intermediate scenario between these two endpoints...
Read full abstract