Survey of evaluation methods for dialogue systems}{Survey of evaluation methods for dialogue systems

Wei-Nan Zhang,Ting Liu,Yangzi Zhang

doi:10.1360/n112017-00125

Abstract

This paper introduces the history of dialogue systems and their evaluation methods. The evaluation methods are categorized as either task-oriented dialogue systems or open domain dialogue systems. This paper investigates and summarizes the different methods of evaluating dialogue systems, analyzes the pros and cons of the different methods, discusses the emphasis of each method, and presents the progress of recent research for the two categories. For task-oriented dialogue systems, this paper introduces the recent research results of Steve Young. In addition, this paper sums up several widely used evaluation approaches. The evaluation methods for open domain chatting systems are explored from two angles: objective index evaluation and simulated artificial scoring. The various indices and different research ideas are analyzed and introduced as well. Finally, through summarizing and analyzing classical evaluation methods of dialogue systems as well as the newer evaluation methods based on neural network models, this study aims to predict developmental trends in evaluation methods for dialogue systems.

Full Text