Survey on evaluation methods for dialogue systems

Jan Deriu,Eneko Agirre,Alvaro Rodrigo,Guillermo Echegoyen,Arantxa Otegi,Sophie Rosset,Mark Cieliebak

doi:10.1007/s10462-020-09866-x

Abstract

In this paper, we survey the methods and concepts developed for the evaluation of dialogue systems. Evaluation, in and of itself, is a crucial part during the development process. Often, dialogue systems are evaluated by means of human evaluations and questionnaires. However, this tends to be very cost- and time-intensive. Thus, much work has been put into finding methods which allow a reduction in involvement of human labour. In this survey, we present the main concepts and methods. For this, we differentiate between the various classes of dialogue systems (task-oriented, conversational, and question-answering dialogue systems). We cover each class by introducing the main technologies developed for the dialogue systems and then present the evaluation methods regarding that class.

Highlights

As the amount of digital data continuously grows, users demand technologies that offer quick access to such data
We define the goal of an evaluation method as having an automated, repeatable evaluation procedure with high correlation to human judgments, which is able to differentiate between various dialogue strategies and is able to explain which features of the dialogue systems are important
User satisfaction modelling Here, the assumption is that the usability of the system can be approximated by the satisfaction of its users, which can be measured by questionnaires

Summary

Introduction

As the amount of digital data continuously grows, users demand technologies that offer quick access to such data. Users rely on systems that support information search interactions such as Siri, Google Assistant, Amazon Alexa or Microsoft XiaoIce (Zhou et al 2018), etc. These technologies, called Dialogue Systems (DS), allow the user to converse with a computer system using natural language. If we assume that a high-quality dialogue system is defined by its ability to respond with an appropriate utterance, it is not clear how to measure appropriateness or what appropriateness means for a particular system. Yes—clear defined Highly structured Restricted Multi Short Mixed/system init Multi-modal This survey is structured as follows; we give a general overview over the different classes of dialogue systems and their characteristics.

Dialogue systems

Evaluation

Modular structure of this article

Characteristics

Dialogue structure

Technologies

Pipelined systems

End‐to‐end trainable systems

User satisfaction modelling

User simulation

Subsystems evaluation

Modelling conversational dialogue systems

Neural generative models

Utterance selection methods

Evaluation methods

General metrics for conversational dialogue systems

Utterance selection metrics

Question answering dialogue systems

Evaluation of QA dialogue systems

Evaluation datasets and challenges

Datasets for task‐oriented dialogue systems

Datasets for conversational dialogue systems

Datasets for question answering dialogue systems

Evaluation challenges

Challenges and future trends

Conclusion

Compliance with ethical standards

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Artificial Intelligence Review	Publication Date: Jun 25, 2020
Citations: 131	License type: open-access

R Discovery Prime

R Discovery Prime

Survey on evaluation methods for dialogue systems

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Artificial Intelligence Review

Lead the way for us

Similar Papers

Survey of evaluation methods for dialogue systems}{Survey of evaluation methods for dialogue systems
Wei-Nan Zhang ... Ting Liu
SCIENTIA SINICA Informationis | VOL. 47
Wei-Nan Zhang, et. al.Wei-Nan Zhang ... Ting Liu
24 Jul 2017
SCIENTIA SINICA Informationis | VOL. 47

Prompting large language models for user simulation in task-oriented dialogue systems
Atheer Algherairy ... Moataz Ahmed
Computer Speech & Language | VOL. 89
Atheer Algherairy, et. al.Atheer Algherairy ... Moataz Ahmed
26 Jul 2024
Computer Speech & Language | VOL. 89

Fusing Task-Oriented and Open-Domain Dialogues in Conversational Agents
Tom Young ... Frank Xing
Proceedings of the AAAI Conference on Artificial Intelligence | VOL. 36
Tom Young, et. al.Tom Young ... Frank Xing
28 Jun 2022
Proceedings of the AAAI Conference on Artificial Intelligence | VOL. 36

Dialogue Systems and Conversational Agents for Patients with Dementia: The Human-Robot Interaction.
Alessandro Russo ... Aldo Gangemi
Rejuvenation Research | VOL. 22
Alessandro Russo, et. al.Alessandro Russo ... Aldo Gangemi
20 Sep 2018
Rejuvenation Research | VOL. 22

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Survey on evaluation methods for dialogue systems

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Artificial Intelligence Review