Abstract
Speech is a perceptuo-motor system. A natural computational modelling framework is provided by cognitive robotics, or more precisely speech robotics, which is also based on embodiment, multimodality, development, and interaction. This chapter describes the bases of a virtual baby robot, an articulatory model that integrates the non-uniform growth of the vocal tract, a set of sensors, and a learning model. The articulatory model delivers sagittal contour, lip shape and acoustic formants from seven input parameters that characterize the configurations of the jaw, the tongue, the lips and the larynx. To simulate the growth of the vocal tract from birth to adulthood, a process modifies the longitudinal dimension of the vocal tract shape as a function of age. The auditory system of the robot comprises a “phasic” system for event detection over time, and a “tonic” system to track formants. The model of visual perception specifies the basic lip characteristics: height, width, area and protrusion. The orosensorial channel, which provides tactile sensations on the lips, the tongue and the palate, is elaborated as a model for the prediction of tongue–palatal contacts from articulatory commands. Learning involves Bayesian programming, in which there are two phases: (i) specification of the variables, decomposition of the joint distribution and identification of the free parameters through exploration of a learning set; and (ii) utilization, which relies on questions about the joint distribution.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have