Abstract

In this paper, we survey the methods and concepts developed for the evaluation of dialogue systems. Evaluation, in and of itself, is a crucial part during the development process. Often, dialogue systems are evaluated by means of human evaluations and questionnaires. However, this tends to be very cost- and time-intensive. Thus, much work has been put into finding methods which allow a reduction in involvement of human labour. In this survey, we present the main concepts and methods. For this, we differentiate between the various classes of dialogue systems (task-oriented, conversational, and question-answering dialogue systems). We cover each class by introducing the main technologies developed for the dialogue systems and then present the evaluation methods regarding that class.

Highlights

  • As the amount of digital data continuously grows, users demand technologies that offer quick access to such data

  • We define the goal of an evaluation method as having an automated, repeatable evaluation procedure with high correlation to human judgments, which is able to differentiate between various dialogue strategies and is able to explain which features of the dialogue systems are important

  • User satisfaction modelling Here, the assumption is that the usability of the system can be approximated by the satisfaction of its users, which can be measured by questionnaires

Read more

Summary

Introduction

As the amount of digital data continuously grows, users demand technologies that offer quick access to such data. Users rely on systems that support information search interactions such as Siri, Google Assistant, Amazon Alexa or Microsoft XiaoIce (Zhou et al 2018), etc. These technologies, called Dialogue Systems (DS), allow the user to converse with a computer system using natural language. If we assume that a high-quality dialogue system is defined by its ability to respond with an appropriate utterance, it is not clear how to measure appropriateness or what appropriateness means for a particular system. Yes—clear defined Highly structured Restricted Multi Short Mixed/system init Multi-modal This survey is structured as follows; we give a general overview over the different classes of dialogue systems and their characteristics.

Dialogue systems
Evaluation
Modular structure of this article
Characteristics
Dialogue structure
Technologies
Pipelined systems
End‐to‐end trainable systems
User satisfaction modelling
User simulation
Subsystems evaluation
Modelling conversational dialogue systems
Neural generative models
Utterance selection methods
Evaluation methods
General metrics for conversational dialogue systems
Utterance selection metrics
Question answering dialogue systems
Evaluation of QA dialogue systems
Evaluation datasets and challenges
Datasets for task‐oriented dialogue systems
Datasets for conversational dialogue systems
Datasets for question answering dialogue systems
Evaluation challenges
Challenges and future trends
Conclusion
Compliance with ethical standards
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.