Abstract
AbstractTask-oriented dialog (TOD) systems often need to formulate knowledge base (KB) queries corresponding to the user intent and use the query results to generate system responses. Existing approaches require dialog datasets to explicitly annotate these KB queries—these annotations can be time consuming, and expensive. In response, we define the novel problems of predicting the KB query and training the dialog agent, without explicit KB query annotation. For query prediction, we propose a reinforcement learning (RL) baseline, which rewards the generation of those queries whose KB results cover the entities mentioned in subsequent dialog. Further analysis reveals that correlation among query attributes in KB can significantly confuse memory augmented policy optimization (MAPO), an existing state of the art RL agent.To address this, we improve the MAPO baseline with simple but important modifications suited to our task. To train the full TOD system for our setting, we propose a pipelined approach: it independently predicts when to make a KB query (query position predictor), then predicts a KB query at the predicted position (query predictor), and uses the results of predicted query in subsequent dialog (next response predictor). Overall, our work proposes first solutions to our novel problem, and our analysis highlights the research challenges in training TOD systems without query annotation.
Highlights
Task-oriented dialog (TOD) systems often need to formulate knowledge base (KB) queries corresponding to the user intent and use the query results to generate system responses
Presence and absence of expensive price range in the query could retrieve almost the same set of KB entities and similar rewards. This can confuse the weakly supervised query predictor. To counter this issue we present a baseline solution for KB query prediction by extending an existing policy optimization technique, memory augmented policy optimization (MAPO; Liang et al, 2018)
As we have gold annotations, we evaluate the KB query predictor and position predictor separately, in addition to the overall TOD system
Summary
Task-oriented dialog (TOD) systems often need to formulate knowledge base (KB) queries corresponding to the user intent and use the query results to generate system responses. We propose a reinforcement learning (RL) baseline, which rewards the generation of those queries whose KB results cover the entities mentioned in subsequent dialog. End-to-end TOD systems (Reddy et al, 2019; Wu et al, 2019; Raghu et al, 2019) do not require state annotations but just the KB query annotations. There exist approaches (Chen et al, 2013, 2015) to induce state annotations in SDS, but we are the first to induce query annotations in end-toend TOD systems. TOD systems cannot be learned with just the state annotations, additional state to KB query mapping/annotation is required. No further annotations are needed to learn TOD system when query annotations are available
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: Transactions of the Association for Computational Linguistics
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.