PONE

Tian Lan,Xian-Ling Mao,Heyan Huang,Xiaoyan Gao,Wei Wei

doi:10.1145/3423168

Abstract

Open-domain generative dialogue systems have attracted considerable attention over the past few years. Currently, how to automatically evaluate them is still a big challenge. As far as we know, there are three kinds of automatic evaluations for open-domain generative dialogue systems: (1) Word-overlap-based metrics; (2) Embedding-based metrics; (3) Learning-based metrics. Due to the lack of systematic comparison, it is not clear which kind of metrics is more effective. In this article, we first measure systematically all kinds of metrics to check which kind is best. Extensive experiments demonstrate that learning-based metrics are the most effective evaluation metrics for open-domain generative dialogue systems. Moreover, we observe that nearly all learning-based metrics depend on the negative sampling mechanism, which obtains extremely imbalanced and low-quality samples to train a score model. To address this issue, we propose a novel learning-based metric that significantly improves the correlation with human judgments by using augmented PO sitive samples and valuable NE gative samples, called PONE. Extensive experiments demonstrate that PONE significantly outperforms the state-of-the-art learning-based evaluation method. Besides, we have publicly released the codes of our proposed metric and state-of-the-art baselines. 1

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

PONE

Abstract

Talk to us

Similar Papers

More From: ACM Transactions on Information Systems

Lead the way for us

Journal: ACM Transactions on Information Systems	Publication Date: Nov 13, 2020
Citations: 14

Similar Papers

EVA2.0: Investigating Open-domain Chinese Dialogue Systems with Large-scale Pre-training
Yuxian Gu ... Jiaxin Wen
Machine Intelligence Research | VOL. 20
Yuxian Gu, et. al.Yuxian Gu ... Jiaxin Wen
18 Feb 2023
Machine Intelligence Research | VOL. 20

Generating Informative Dialogue Responses with Keywords-Guided Networks
Heng-Da Xu ... Jingjing Zhu
-
Heng-Da Xu, et. al.Heng-Da Xu ... Jingjing Zhu
01 Jan 2020
01 Jan 2020

Deconstruct to Reconstruct a Configurable Evaluation Metric for Open-Domain Dialogue Systems
Vitou Phy ... Yang Zhao
-
Vitou Phy, et. al.Vitou Phy ... Yang Zhao
01 Jan 2020
01 Jan 2020

Predictive Engagement: An Efficient Metric for Automatic Evaluation of Open-Domain Dialogue Systems
Sarik Ghazarian ... Nanyun Peng
Proceedings of the AAAI Conference on Artificial Intelligence | VOL. 34
Sarik Ghazarian, et. al.Sarik Ghazarian ... Nanyun Peng
03 Apr 2020
Proceedings of the AAAI Conference on Artificial Intelligence | VOL. 34

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

PONE

Abstract

Talk to us

Similar Papers

More From: ACM Transactions on Information Systems