In the present computational model of speech production, an utterance is represented as an organization of primitive linguistic units, gestures, into a larger structure, a gestural score. Each distinct gesture is linked to a particular subset of vocal tract variables (e.g., lip aperture and protrusion) and model articulators (e.g., lips and jaw), and is associated with a set of time-invariant dynamic parameters (e.g., lip aperture target, stiffness, and damping coefficients). The values of the dynamic parameters and their activation intervals are computed as part of the gestural score for a given utterance using a linguistic gestural model that includes a gesture-based dictionary of English syllables and a flexible rule interpreter for manipulating dynamic parameters and inter-gestural phasing. The gestural score serves as input to our task-dynamic model of sensorimotor coordination. In this model, the evolving configuration of the model articulators results from the gesturally and posturally specific way that driving influences generated in the tract-variable space are distributed across the associated sets of synergistic articulatory components. Coarticulation effects of various sorts are automatically produced as a function of the spatial and temporal overlap of two (or more) gestures. Significantly, explicit trajectory. planning is not required and the model functions in exactly the same way during simulations of unperturbed, mechanically perturbed, and coproduced speech gestures. [Work supported by NIH NS-13617, HD-01994, and NSF BNS 8520709.]
Read full abstract