With the development of pre-trained language models (PrLM), the research of PrLM-based multi-domain task-oriented dialogue systems (TOD) has attracted growing attention and has achieved great progress. However, most current studies suffer from two problems. First, they model dialogue state tracking as an independent subtask supervised by slot-value pairs, resulting in poor adaptability when transferring to new task domains. Second, these studies ignore the fact that not all dialogue histories are valuable for the ensuing turns as they increase in length. To tackle these two issues, we propose a simple and novel framework to explore multi-domain TOD by jointly training response generation and dialogue summarization with PrLM as the backbone. Specifically, first, we use fluent text generated by the dialogue summarization model to replace formatted dialogue states, treating the dialogue state identification task as a natural language generation task, which allows dialogue state tracking to be easily extended to new task domains. Second, dialogue summarization removes redundant and useless information from the current dialogue process and is fed into the response decoder enabling the system to focus on crucial details in long dialogues. Furthermore, we employ a dialogue chunk detector to assist the dialogue summarization model and design a fusion mechanism to dynamically integrate helpful dialogue summarization into the response generation process. Experimental results show that the proposed model achieves state-of-the-art performance in both automatic and human evaluations on two public datasets and demonstrate that the joint framework can benefit both tasks from each other.
Read full abstract