Abstract

The paper overviews recent progress and challenges in a number of audiovisual speech processing technologies with main emphasis on the problem of automatic speech recognition. It is well known that visual channel information can improve automatic speech processing for human-computer interaction. To automatically process and incorporate such information into automatic systems, a number of steps are required that are surprisingly similar accross speech technologies. Crucial above all is the issue of feature representation of visual speech and its robust extraction. In addition, appropriate integration of the audio and visual representations is required, in order to ensure improved performance of the bimodal systems over audio-only baselines. These topics are discussed in detail in the talk, with main emphasis on their application to the speech recognition problem in the challenging environments of automobiles and smart rooms.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call