Adaptive search query generation and refinement in systematic literature review

Maisie Badami,Boualem Benatallah,Marcos Baez

doi:10.1016/j.is.2023.102231

Abstract

Systematic literature reviews (SLRs) are a central part of evidence-based research, which involves collecting and integrating empirical evidence on specific research questions. A key step in this process is building Boolean search queries, which are at the core of information retrieval systems that support literature search. This involves turning general research aims into specific search terms that can be combined into complex Boolean expressions. Researchers must build and refine search queries to ensure they have sufficient coverage and properly represent the literature. In this paper, we propose an adaptive query generation and refinement pipeline for SLR search that uses reinforcement learning to learn the optimal modifications to a query based on feedback from researchers about its performance. Empirical evaluations with 10 SLR datasets showed our approach achieves comparable performance to queries manually composed by SLR authors. We also investigate the impact of design decisions on the performance of the query generation and refinement pipeline. Specifically, we study the effects of the type of input seed, the use of general versus domain-specific word embedding models, the sampling strategy for relevance feedback, and number of iterations in the refinement process. Our results provide insights into the effects of these choices on the pipeline’s performance.

Full Text