The human–computer conversation system is a significant application in the field of multimedia. To select an appropriate response, retrieval-based systems model the matching between the dialogue history and response candidates. However, most of the existing methods cannot fully capture and utilize varied matching patterns, which may degrade the performance of the systems. To address the issue, a densely enhanced semantic network (DESN) is proposed in our work. Given a multi-turn dialogue history and a response candidate, DESN first constructs the semantic representations of sentences from the word perspective, the sentence perspective, and the dialogue perspective. In particular, the dialogue perspective is a novel one introduced in our work. The dependencies between a single sentence and the whole dialogue are modeled from the dialogue perspective. Then, the response candidate and each utterance in the dialogue history are made to interact with each other. The varied matching patterns are captured for each utterance–response pair by using a dense matching module. The matching patterns of all the utterance–response pairs are accumulated in chronological order to calculate the matching degree between the dialogue history and the response. The responses in the candidate pool are ranked with the matching degree, thereby returning the most appropriate candidate. Our model is evaluated on the benchmark datasets. The experimental results prove that our model achieves significant and consistent improvement when compared with other baselines.
Read full abstract