Multi-turn response selection is an important branch in the field of natural language processing, which aims to select the most appropriate response based on multi-turn dialogue. Most state-of-the-art models adopt pre-trained language models (PrLMs) and multiple auxiliary tasks to enhance their ability to understand the semantics in multi-turn dialogue. However, some critical challenges still remain to be addressed. Optimizing multiple auxiliary tasks simultaneously may significantly increase the training cost. Meanwhile, the semantic gap between the optimization objectives of the main and auxiliary tasks may bring noise to pre-trained language models. To address these challenges, we propose an efficient BERT-based neural network model with local context comprehension (BERT-LCC) for multi-turn response selection. First, we propose a self-supervised learning strategy, which introduces an auxiliary task named Response Prediction in Random Sliding Windows (RPRSW). In a multi-turn dialogue, the RPRSW task takes utterances falling within a random sliding window as input and predicts whether the last utterance within the sliding window is the appropriate response for the local dialogue context. This auxiliary task can enhance BERT’s understanding of local semantic information. Second, we propose a local information fusion (LIF) mechanism that collects multi-granularity local features at different dialogue stages and employs a gating function to fuse global features with local features. Third, we introduce a simple but effective domain learning strategy to learn rich semantic information at different dialogue stages during pre-training. Experimental results on two public benchmark datasets show that BERT-LCC outperforms other state-of-the-art models.
Read full abstract