Phoneme clustering based on segmental lip configurations in naturally spoken sentences

Jianxia Xue,Lynne E Bernstein,Jintao Jiang,Abeer Alwan

doi:10.1121/1.4788582

Abstract

It has been shown that face (lips, cheeks, and chin) information can account to a large extent for visual speech perception in isolated syllables and words. Visual speech synthesis has used small reduced sets of phonemes (‘‘visemes’’), under the theory that perceivers are limited in their ability to extract visual speech information. In this study, lip configurations from a manually segmented sentence database [L. Bernstein et al., J. Acoust. Soc. Am. 107, 2887 (2000)] were analyzed to provide phoneme clusters that are algorithmically distinguishable using mouth vertical/horizontal opening and lip protrusion from the middle position of each segment. The lip feature sample spaces for each phoneme were represented by Gaussian mixture models. Maximum posterior probability classification results were computed for each phoneme. Confusion matrices were generated from the classification results, and a set of confusions with 74% or higher within-group classification correct was judged to be a cluster. Preliminary results from 191 sentences by a single talker generated the following clusters: {/p, b, m/(77%), /f, v/(74%), /w, r/(80%), /t, d, s, z, D, k, n/(88%)}. We will present results analyzing the entire English phoneme set across different talkers and compare the results with visual perceptual clusters. [Work supported in part by the NSF.]

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Phoneme clustering based on segmental lip configurations in naturally spoken sentences

Abstract

Talk to us

Similar Papers

More From: The Journal of the Acoustical Society of America

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Phoneme clustering based on segmental lip configurations in naturally spoken sentences

Abstract

Talk to us

Similar Papers

More From: The Journal of the Acoustical Society of America