Abstract

Although the importance of contextual information in speech recognition has been acknowledged for a long time now, it has remained clearly underutilized even in state-of-the-art speech recognition systems. This article introduces a novel, methodologically hybrid approach to the research question of context-dependent speech recognition in human–machine interaction. To the extent that it is hybrid, the approach integrates aspects of both statistical and representational paradigms. We extend the standard statistical pattern-matching approach with a cognitively inspired and analytically tractable model with explanatory power. This methodological extension allows for accounting for contextual information which is otherwise unavailable in speech recognition systems, and using it to improve post-processing of recognition hypotheses. The article introduces an algorithm for evaluation of recognition hypotheses, illustrates it for concrete interaction domains, and discusses its implementation within two prototype conversational agents.

Highlights

  • Human speech recognition makes extensive use of the listener’s expectations of what the speaker could say in the current situation

  • To reduce the limitations of the statistical approach, we employ a representational approach based on the focus tree model of attentional information in human–machine interaction. Various adaptations of this model were successfully applied in several prototypical conversational agents for the purposes of natural language understanding and dialogue management.[2,14,15,16]. In this contribution, we extend it for the purpose of modelling and dynamically prioritizing contextual information in automatic speech recognition

  • We extended the standard statistical pattern-matching approach to speech recognition with a novel, cognitively inspired and analytically tractable algorithm for evaluation of recognition hypotheses

Read more

Summary

Introduction

Human speech recognition makes extensive use of the listener’s expectations of what the speaker could say in the current situation. It appears uncontroversial that, if we aim at reducing the existing performance gap between humans and conversational agents, our systems should integrate contextual information as well. Contextual information remained clearly underutilized, even in state-of-theart speech recognition systems. It introduces a novel, methodologically hybrid approach to the research question of contextdependent speech recognition in human–machine interaction. The proposed approach is illustrated with two prototype conversational agents. Algorithm’ reports on a prototypical speech recognition system and practically illustrates the introduced algorithm. ‘Prototype B: Assessing the accuracy of the algorithm’ reports on another prototype speech recognition system and assesses the accuracy of the algorithm. ‘Conclusion’ discusses the computational appropriateness and generalizability of the proposed algorithm

Related work and contributions of this article
Findings
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.