Abstract

Human beings are highly effective at integrating multiple sources of uncertain information, and mounting evidence points to this integration being practically optimal in a Bayesian sense. Yet, in speech processing systems, the two central tasks of speech signal enhancement and of speech or phonetic-state recognition are often performed almost in isolation, with only estimates of mean values being exchanged between them. This paper describes concepts for enhancing the interface of these two systems, considering a range of appropriate probabilistic representations. Examples will illustrate how such interfaces can improve the quality of both components: On the one hand, more reliable pattern recognition can be attained, while on the other hand, enhanced signal quality is achieved when feeding back information from a pattern recognition stage to the signal preprocessing. This latter idea will be described using the example of twin-HMMs, audiovisual speech models that help to recover lost acoustic information by exploiting video data. Overall, it will be shown how broader, probabilistic interfaces between signal processing and pattern recognition can help to achieve better performance in real-world conditions, and to more closely approximate the Bayesian ideal of using all sources of information in accordance with their respective degree of reliability.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call