Abstract
We apply the modular dialog system framework to combine open-domain question answering with a task-oriented dialog system. This meta dialog system can answer questions from Wikipedia and at the same time act as a personal assistant. The aim of this system is to combine the strength of an open-domain question answering system with the conversational power of task-oriented dialog systems. After explaining the technical details of the system, we combined a new dataset out of standard datasets to evaluate the system. We further introduce an evaluation method for this system. Using this method, we compare the performance of the non-modular system with the performance of the modular system and show that the modular dialog system framework is very suitable for this combination of conversational agents and that the performance of each agent decreases only marginally through the modular setting.
Highlights
Nehring and Ahmed (2021) defined a modular dialog system (MDS) as a dialog system that consists of multiple modules
If we split the Stanford Question Answering Dataset (SQuAD) dataset randomly, the module selection might overfit on such statistical cues, learning that the word Super Bowl is a hint that this utterance is aimed at the open-domain question answering system (ODQA) system
We found that many questions from SQuAD are not answerable in the ODQA scenario
Summary
Nehring and Ahmed (2021) defined a modular dialog system (MDS) as a dialog system that consists of multiple modules. We want to use this framework to combine a task-oriented dialog system (TODS) with an open-domain question answering system (ODQA). A TODS cannot only answer questions, but it can understand other user queries. The TODS can understand greetings and respond with a greeting, a task that is not possible for ODQA systems. Another strength of TODS is the possibility to create complex dialogs spanning multiple turns using a dialog manager. We apply the MDS framework to the combination of an ODQA system and a TODS. The method inspects module selection, ODQA and TODS individually and measures the performance change of those from the non-modular to the modular scenario. We show that the performance drop is very low because the module selection performs very well in our setup with an f1-measure of 0.964
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have