Fifty years of progress in speech understanding systems

Victor Zue

doi:10.1121/1.4784970

Abstract

Researchers working on human-machine interfaces realized nearly 50 years ago that automatic speech recognition (ASR) alone is not sufficient; one needs to impart linguistic knowledge to the system such that the signal could ultimately be understood. A speech understanding system combines speech recognition (i.e., the speech to symbols conversion) with natural language processing (i.e., the symbol to meaning transformation) to achieve understanding. Speech understanding research dates back to the DARPA Speech Understanding Project in the early 1970s. However, large-scale efforts only began in earnest in the late 1980s, with government research programs in the U.S. and Europe providing the impetus. This has resulted in many innovations including novel approaches to natural language understanding (NLU) for speech input, and integration techniques for ASR and NLU. In the past decade, speech understanding systems have become major building blocks of conversational interfaces that enable users to access and manage information using spoken dialogue, incorporating language generation, discourse modeling, dialogue management, and speech synthesis. Today, we are at the threshold of developing multimodal interfaces, augmenting sound with sight and touch. This talk will highlight past work and speculate on the future. [Work supported by an industrial consortium of the MIT Oxygen Project.]

Full Text