Caching Historical Embeddings in Conversational Search

Ophir Frieder,Ida Mele,Raffaele Perego,Franco Maria Nardini,Cristina Ioana Muntean,Nicola Tonellotto

doi:10.1145/3578519

Abstract

Rapid response, namely, low latency, is fundamental in search applications; it is particularly so in interactive search sessions, such as those encountered in conversational settings. An observation with a potential to reduce latency asserts that conversational queries exhibit a temporal locality in the lists of documents retrieved. Motivated by this observation, we propose and evaluate a client-side document embedding cache, improving the responsiveness of conversational search systems. By leveraging state-of-the-art dense retrieval models to abstract document and query semantics, we cache the embeddings of documents retrieved for a topic introduced in the conversation, as they are likely relevant to successive queries. Our document embedding cache implements an efficient metric index, answering nearest-neighbor similarity queries by estimating the approximate result sets returned. We demonstrate the efficiency achieved using our cache via reproducible experiments based on Text Retrieval Conference Conversational Assistant Track datasets, achieving a hit rate of up to 75% without degrading answer quality. Our achieved high cache hit rates significantly improve the responsiveness of conversational systems while likewise reducing the number of queries managed on the search back-end.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Caching Historical Embeddings in Conversational Search

Abstract

Talk to us

Similar Papers

More From: ACM Transactions on the Web

Lead the way for us

Journal: ACM Transactions on the Web	Publication Date: Oct 8, 2024
Citations: 2

Similar Papers

Reducing outgoing traffic of proxy cache by using client-cluster
Kyungbaek Kim ... Daeyeon Park
Journal of Communications and Networks | VOL. 8
Kyungbaek Kim, et. al.Kyungbaek Kim ... Daeyeon Park
01 Sep 2006
Journal of Communications and Networks | VOL. 8

Swarm Intelligence Based File Replication and Consistency Maintenance in Structured P2P File Sharing Systems
Haiying Shen ... Guoxin Liu
IEEE Transactions on Computers | VOL. 64
Haiying Shen, et. al.Haiying Shen ... Guoxin Liu
01 Oct 2015
IEEE Transactions on Computers | VOL. 64

Unison Cache: A Scalable and Effective Die-Stacked DRAM Cache
Djordje Jevdjic ... Cansu Kaynak
-
Djordje Jevdjic, et. al.Djordje Jevdjic ... Cansu Kaynak
01 Dec 2014
01 Dec 2014

Evaluating delayed write in a multilevel caching file system
Daniel A Muntz ... Charles J Antonelli
-
Daniel A Muntz, et. al.Daniel A Muntz ... Charles J Antonelli
01 Jan 1996
01 Jan 1996

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Caching Historical Embeddings in Conversational Search

Abstract

Talk to us

Similar Papers

More From: ACM Transactions on the Web