Abstract

Evaluating human dialogue is a complex task, as our conversation are never structured. There are, however, cases where there is some structure in our conversation, e.g., in a typical call centre, dialogue between a call centre agent and customer revolves around certain topics of conversation. These dialogues can be evaluated based on some pre-specified criteria as well as sub-criteria. This evaluation is typically done manually, which can be time-consuming, motivating the need for an automated system that employs Artificial Intelligence (AI) algorithms to evaluate dialogues efficiently. In this paper, we have proposed a novel dialogue-evaluation framework that leverages recent advancements in deep learning research. The contributions of this work are two fold. Firstly, we introduce a straightforward end-to-end framework – CallAI, for evaluating dialogues in any domain, based on some predefined hierarchical criteria. Secondly, we present a novel algorithm – TAABLM, utilizing a novel combination of aspect-based learning along with traditional TF-IDF features for text. We show in this paper, that TAABLM outperforms conventional baselines such as BERT, LSTM, etc, delivering improved performance in automated dialogue evaluation, whereas CallAI offers a simple yet elegant framework for an AI-based solution to hierarchical dialogue evaluation. We demonstrate the efficacy of our proposed framework and proposed algorithm on three datasets, where we quantify performance in terms of an aggregated dialogue-score as well as in terms of either accuracy or AuROC metrics.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call