Abstract
The paper overviews recent progress and challenges in a number of audiovisual speech processing technologies with main emphasis on the problem of automatic speech recognition. It is well known that visual channel information can improve automatic speech processing for human-computer interaction. To automatically process and incorporate such information into automatic systems, a number of steps are required that are surprisingly similar accross speech technologies. Crucial above all is the issue of feature representation of visual speech and its robust extraction. In addition, appropriate integration of the audio and visual representations is required, in order to ensure improved performance of the bimodal systems over audio-only baselines. These topics are discussed in detail in the talk, with main emphasis on their application to the speech recognition problem in the challenging environments of automobiles and smart rooms.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.