Abstract
This paper concerns the self-improvement of voice interface by using acoustic model re-training with user-input spoken query at early stage of commercialization, when the conventional confidence measure-based acoustic model re-training is not reliable. This paper analyzes error patterns in user-input spoken queries, categorizes these error patterns, defines a quantitative measurement for each category of error patterns and proposes a filter-based approach over this quantitative measurement. The proposed filter-based method includes four distinctive filters: filter over environmental noise level, filter over non-pitch ratio within utterance, filter over average phoneme duration function score and filter over clipped frame composition ratio. For the evaluation, the initial performance of the acoustic model was measured at 66.1% in terms of speech recognition rate. The overall performance is demonstrated as 73.8% when all of the proposed filters are applied for the re-training of the acoustic model. This result demonstrates 3.1% better recognition rate than a confidence measure-based acoustic model re-training method. Our proposed method is applicable to other data-driven classification services of consumer electronic products in other mediums (e.g. image) at their early stage of commercialization.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.