An Improved Sentence Embeddings based Information Retrieval Technique using Query Reformulation

Vishal Gupta,Ashutosh Dixit,Shilpa Sethi

doi:10.1109/incacct57535.2023.10141788

Abstract

Information retrieval (IR) is a field of study that focuses on gathering, organizing, storing, analyzing, and accessing information. It is used in all applications where searching for information is required through the internet. Selection and ranking of relevant web documents to satisfy users’ information need in response to users’ queries is the most important task associated with IR. To better comprehend the user’s intent and create a more efficient query, the initial query has been reformulated by adding new terms. The semantic text-similarity techniques proposed by researchers in the past demand a significant amount of trained labeled data as well as user intervention; both of them are scarce, hard to capture, and difficult to maintain. In general, these techniques ignored contextual information as well as word order, which causes the problems like data sparsity and latitudinal explosion. Deep learning techniques are now being utilized to find text similarities. Various sentence embedding models exist today that can use vectors to represent complete sentences and their semantics. This helps the search engine to comprehend the context, intent, and different aspects of the user text. So, in this paper, a sentence embedding model based query reformulation (QR) has been proposed for improving document ranking performance using a universal sentence encoder (USE) and a cosine similarity measure. Four standard datasets CACM, CISI, ADI, and Medline are used to perform all the experiments. The outcomes demonstrate the superior performance of the USE-based QR system over the SBERT sentence embedding model by 4.48 %, 5.97%, and 7.2% and 2.1% for ADI, CISI, CACM, and Medline datasets respectively.

Full Text