Self-improvement of voice interface with user-input spoken query at early stage of commercialization

Kwang-Ho Kim,Namhyun Cho,Hyung Jeon,Donghyun Lee,Ji-Hwan Kim

doi:10.1109/tce.2013.6689699

Abstract

This paper concerns the self-improvement of voice interface by using acoustic model re-training with user-input spoken query at early stage of commercialization, when the conventional confidence measure-based acoustic model re-training is not reliable. This paper analyzes error patterns in user-input spoken queries, categorizes these error patterns, defines a quantitative measurement for each category of error patterns and proposes a filter-based approach over this quantitative measurement. The proposed filter-based method includes four distinctive filters: filter over environmental noise level, filter over non-pitch ratio within utterance, filter over average phoneme duration function score and filter over clipped frame composition ratio. For the evaluation, the initial performance of the acoustic model was measured at 66.1% in terms of speech recognition rate. The overall performance is demonstrated as 73.8% when all of the proposed filters are applied for the re-training of the acoustic model. This result demonstrates 3.1% better recognition rate than a confidence measure-based acoustic model re-training method. Our proposed method is applicable to other data-driven classification services of consumer electronic products in other mediums (e.g. image) at their early stage of commercialization.

Full Text