Abstract

Coarse-grained response selection is a fundamental and essential subsystem for the widely used retrieval-based chatbots, aiming to recall a coarse-grained candidate set from a large-scale dataset. The dense retrieval technique has recently been proven very effective in building such a subsystem. However, dialogue dense retrieval models face two problems in real scenarios: (1) the multi-turn dialogue history is re-computed in each turn, leading to inefficient inference; (2) the index storage of the offline index is enormous, significantly increasing the deployment cost. To address these problems, we propose an efficient coarse-grained response selection subsystem consisting of two novel methods. Specifically, to address the first problem, we propose the H ierarchical D ense R etrieval. It caches rich multi-vector representations of the dialogue history and only encodes the latest user’s utterance, leading to better inference efficiency. Then, to address the second problem, we design the D eep S emantic H ashing to reduce the index storage while effectively saving its recall accuracy notably. Extensive experimental results prove the advantages of the two proposed methods over previous works. Specifically, with the limited performance loss, our proposed coarse-grained response selection model achieves over 5x FLOPs speedup and over 192x storage compression ratio. Moreover, our source codes have been publicly released. 1

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call