Detection of unknown words in large vocabulary speech recognition.

Satoru Hayamizu,Katunobu Itou,Kazuyo Tanaka

doi:10.1250/ast.16.165

Satoru Hayamizu, Katunobu Itou + Show 1 more

Open Access

https://doi.org/10.1250/ast.16.165

Copy DOI

Abstract

This paper describes the relation between vocabulary sizes and detection errors of unknown words in large vocabulary speech recognition through recognition and detection experiments. Although the relation between vocabulary sizes and recognition performances has been reported, the relation between vocabulary sizes and detection performances has not yet been studied. Especially, it has not for the cases of vocabulary sizes of over 1, 000 words. Experiments were conducted using the speech material of speaker MAU's ATR word speech database. The entries of the dictionary used is 40, 000 words from the Shinmeikai Japanese Language Dictionary. It is shown that when the vocabulary size increases from 1, 000 words to 40, 000 words, the relation between vocabulary sizes and detection errors has a similar tendency with the relation between vocabulary sizes and recognition errors. And increases of detection errors caused by increases of vocabulary sizes are shown to be small for the case of within vocabulary, compared with increases of detection errors for the case of out of vocabulary. These results should be taken into accounts in designing large vocabulary speech recognition systems including unknown word processing.

Full Text