Systems and methods for combining subword recognition and whole word recognition of a spoken input

Jean-Manuel Van Thong,Ernest Pusateri

doi:10.1121/1.2212629

Abstract

A computer-based detection (e.g., speech recognition) system combines a word decoder and subword decoder to detect words (or phrases) in a spoken input provided by a user into a speaker connected to the detection system. The word decoder detects words by comparing an input pattern (e.g., of hypothetical word matches) to reference patterns (e.g., words). The subword decoder compares an input pattern (e.g., hypothetical words matches based on subword or phoneme recognition) to reference patterns (e.g., words) based on a word pronunciation distance measure that indicates how close each input pattern is to matching each reference pattern. The subword decoder sorts the source set of reference patterns based on a closeness of each reference pattern to correctly matching the input pattern based on generated pattern comparisons. The word decoder and subword decoder each provide an N-best list of hypothetical matches to the spoken input. A list fusion module of the detection system selectively combines the two N-best lists to produce a final or combined N-best list. The final or combined list has a predefined number of matches.

Full Text