Abstract

We apply the modular dialog system framework to combine open-domain question answering with a task-oriented dialog system. This meta dialog system can answer questions from Wikipedia and at the same time act as a personal assistant. The aim of this system is to combine the strength of an open-domain question answering system with the conversational power of task-oriented dialog systems. After explaining the technical details of the system, we combined a new dataset out of standard datasets to evaluate the system. We further introduce an evaluation method for this system. Using this method, we compare the performance of the non-modular system with the performance of the modular system and show that the modular dialog system framework is very suitable for this combination of conversational agents and that the performance of each agent decreases only marginally through the modular setting.

Highlights

  • Nehring and Ahmed (2021) defined a modular dialog system (MDS) as a dialog system that consists of multiple modules

  • If we split the Stanford Question Answering Dataset (SQuAD) dataset randomly, the module selection might overfit on such statistical cues, learning that the word Super Bowl is a hint that this utterance is aimed at the open-domain question answering system (ODQA) system

  • We found that many questions from SQuAD are not answerable in the ODQA scenario

Read more

Summary

Introduction

Nehring and Ahmed (2021) defined a modular dialog system (MDS) as a dialog system that consists of multiple modules. We want to use this framework to combine a task-oriented dialog system (TODS) with an open-domain question answering system (ODQA). A TODS cannot only answer questions, but it can understand other user queries. The TODS can understand greetings and respond with a greeting, a task that is not possible for ODQA systems. Another strength of TODS is the possibility to create complex dialogs spanning multiple turns using a dialog manager. We apply the MDS framework to the combination of an ODQA system and a TODS. The method inspects module selection, ODQA and TODS individually and measures the performance change of those from the non-modular to the modular scenario. We show that the performance drop is very low because the module selection performs very well in our setup with an f1-measure of 0.964

Conversational Agents
Frankenbot
Evaluation measures
Datasets
Hybrid Dialog Systems
Creating a combined dataset for question answering and intent recognition
Modular Dialog System
Modules
Evaluation of the single modules
Evaluation of module selection
Evaluation of the full system
Questions that are answerable in ODQA
Results of module evaluation
Results of Module Selection
Results of the full system evaluation
Error analysis of module selection
Conclusion
Future Work
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call