Statistical Lip-Appearance Models Trained Automatically Using Audio Information

Philippe Daubias,Paul Deléglise

doi:10.1155/s1110865702206186

Abstract

We aim at modeling the appearance of the lower face region to assist visual feature extraction for audio-visual speech processing applications. In this paper, we present a neural network based statistical appearance model of the lips which classifies pixels as belonging to the lips, skin, or inner mouth classes. This model requires labeled examples to be trained, and we propose to label images automatically by employing a lip-shape model and a red-hue energy function. To improve the performance of lip-tracking, we propose to use blue marked-up image sequences of the same subject uttering the identical sentences as natural nonmarked-up ones. The easily extracted lip shapes from blue images are then mapped to the natural ones using acoustic information. The lip-shape estimates obtained simplify lip-tracking on the natural images, as they reduce the parameter space dimensionality in the red-hue energy minimization, thus yielding better contour shape and location estimates. We applied the proposed method to a small audio-visual database of three subjects, achieving errors in pixel classification around 6%, compared to 3% for hand-placed contours and 20% for filtered red-hue.

Highlights

Today, automatic speech recognition (ASR) works well for several applications, but performance depends highly on the specificity of the task, and on the type and level of surrounding noise
Using visual information in unconstrained conditions requires having accurate visual feature extraction, regardless of the visual features used: (i) pixel-based features: images are fed directly into a speech recognition system [4, 5, 8, 13], after applying a few transformations or normalizations to the images (fixed-size region of interest (ROI) cropping, histogram normalization, for example); (ii) model-based features: a model is located on images, and parameters to be used for ASR are deduced form the location and shape of the model
We present here an multiple layer perceptrons (MLPs)-based statistical appearance model of the lips which classifies pixels as belonging to the lips, skin, or inner mouth classes

Summary

INTRODUCTION

Automatic speech recognition (ASR) works well for several applications, but performance depends highly on the specificity of the task, and on the type and level of surrounding noise. We present here an MLP-based statistical appearance model of the lips which classifies pixels as belonging to the lips, skin, or inner mouth classes Such an ANN requires labeled examples to be trained and these may only be found on natural images. The obtained lip-shape estimates simplify lip-tracking on the natural images, as they reduce the parameter space dimensionality in the red-hue energy minimization, yielding better contour shape and location estimates. Such lip contours can be used to automatically label image blocks as belonging to one of the three classes of interest.

LIP APPEARANCE MODELING

Literature approaches

Statistical modeling of lip appearance

Training of the lip appearance model

LIP SHAPE MODELING

Lip-contour extraction in “blue” images

Shape model building

Shape model evaluation

LIP CONTOUR LOCATION ON NATURAL IMAGES

Joint lip-shape and location estimation

Cascade lip shape and location estimation using acoustic information

Use of acoustic information for lip shape estimation

Lip contour location estimation

The audio-visual database

The evaluation paradigm

Lip contour evaluation

Appearance model evaluation

Experimental results

Lip-shape estimation

Quality of location

Appearance model accuracy

SUMMARY AND DISCUSSION

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: EURASIP Journal on Advances in Signal Processing	Publication Date: Nov 28, 2002
Citations: 41	License type: cc-by

R Discovery Prime

R Discovery Prime

Statistical Lip-Appearance Models Trained Automatically Using Audio Information

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: EURASIP Journal on Advances in Signal Processing

Lead the way for us

Similar Papers

Statistical shape and appearance models of bones
Nazli Sarkalkan ... Amir A Zadpoor
Metabolic Bone Disease and Related Research | VOL. 60
Nazli Sarkalkan, et. al.Nazli Sarkalkan ... Amir A Zadpoor
12 Dec 2013
Metabolic Bone Disease and Related Research | VOL. 60

Multi-level approach for statistical appearance models with probabilistic correspondences
Heinz Handels ... Julia Krüger
-
Heinz Handels, et. al.Heinz Handels ... Julia Krüger
21 Mar 2016
21 Mar 2016

Image-based vs. mesh-based statistical appearance models of the human femur: Implications for finite element simulations
Serena Bonaretti ... Philippe Büchler
Medical Engineering & Physics | VOL. 36
Serena Bonaretti, et. al.Serena Bonaretti ... Philippe Büchler
27 Sep 2014
Medical Engineering & Physics | VOL. 36

Osteoporosis Presence Verification Using MACE Filter Based Statistical Models of Appearance with Application to Cervical X-ray Images
Mustapha Aouache ... S. A. Samad
-
Mustapha Aouache, et. al.Mustapha Aouache ... S. A. Samad
01 Jan 2008
01 Jan 2008

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Statistical Lip-Appearance Models Trained Automatically Using Audio Information

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: EURASIP Journal on Advances in Signal Processing