Abstract

The Hearsay-II system, developed during the DARPA-sponsored five-year speech-understanding research program, represents both a specific solution to the speech-understanding problem and a general framework for coordinating independent processes to achieve cooperative problem-solving behavior. As a computational problem, speech understanding reflects a large number of intrinsically interesting issues. Spoken sounds are achieved by a long chain of successive transformations, from intentions, through semantic and syntactic structuring, to the eventually resulting audible acoustic waves. As a consequence, interpreting speech means effectively inverting these transformations to recover the speaker's intention from the sound. At each step in the interpretive process, ambiguity and uncertainty arise. The Hearsay-II problem-solving framework reconstructs an intention from hypothetical interpretations formulated at various levels of abstraction. In addition, it allocates limited processing resources first to the most promising incremental actions. The final configuration of the Hearsay-II system comprises problem-solving components to generate and evaluate speech hypotheses, and a focus-of-control mechanism to identify potential actions of greatest value. Many of these specific procedures reveal novel approaches to speech problems. Most important, the system successfully integrates and coordinates all of these independent activities to resolve uncertainty and control combinatorics. Several adaptations of the Hearsay-II framework have already been undertaken in other problem domains, and it is anticipated that this trend will continue; many future systems necessarily will integrate diverse sources of knowledge to solve complex problems cooperatively. Discussed in this paper are the characteristics of the speech problem in particular, the special kinds of problem-solving uncertainty in that domain, the structure of the Hearsay-II system developed to cope with that uncertainty, and the relationship between Hearsay-II's structure and those of other speech-understanding systems. The paper is intended for the general computer science audience and presupposes no speech or artificial intelligence background.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call