Articulatory feature detection

O Fujimura,Gayle Escolar,Joan E Miller

doi:10.1121/1.2015735

Abstract

How accurately can simple algorithms identify segmental gestures by direct observation of articulatory movements in connected speech? We have made a preliminary examination of this problem using x-ray data on the movements of six metal pellets placed on the tongue, lip, mandible, and velum [Fujimura, Miller, and Kiritani, J. Acoust. Soc. Am. 60, S64, (A) (1976)]. The data pertained to natural utterances of English sentences, at a speed of 4–6 syllables/sec. It is relatively easy to identify visually the labial, palatal/velar, and interdental consonants, respectively. In most cases this can be done by merely setting an appropriate threshold for the pellet coordinate value of the most pertinent articulator, particularly if we know how many of such events we should expect in the given utterance. By combining such feature detectors we can identify /p, b/, /m/, /f,v/, θ, ∂/, /k, g/, /η/, /∫/, and /r/ with high accuracy. Stressed syllables are mostly identifiable via mandible height. The domain of nasalization generally covers any string of nasal consonants and vowels uninterrupted by non-nasal true consonants. Data produced by two additional American speakers and similar data on Japanese are being examined. We acknowledge our thanks to Dr. S. Kiritani, University of Tokyo, for his cooperation.

Full Text