Abstract

How accurately can simple algorithms identify segmental gestures by direct observation of articulatory movements in connected speech? We have made a preliminary examination of this problem using x-ray data on the movements of six metal pellets placed on the tongue, lip, mandible, and velum [Fujimura, Miller, and Kiritani, J. Acoust. Soc. Am. 60, S64, (A) (1976)]. The data pertained to natural utterances of English sentences, at a speed of 4–6 syllables/sec. It is relatively easy to identify visually the labial, palatal/velar, and interdental consonants, respectively. In most cases this can be done by merely setting an appropriate threshold for the pellet coordinate value of the most pertinent articulator, particularly if we know how many of such events we should expect in the given utterance. By combining such feature detectors we can identify /p, b/, /m/, /f,v/, θ, ∂/, /k, g/, /η/, /∫/, and /r/ with high accuracy. Stressed syllables are mostly identifiable via mandible height. The domain of nasalization generally covers any string of nasal consonants and vowels uninterrupted by non-nasal true consonants. Data produced by two additional American speakers and similar data on Japanese are being examined. We acknowledge our thanks to Dr. S. Kiritani, University of Tokyo, for his cooperation.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.