Abstract

Systematic reviews (SRs) are a rigorous method for synthesizing empirical evidence to answer specific research questions. However, they are labor-intensive because of their collaborative nature, strict protocols, and typically large number of documents. Large language models (LLMs) and their applications such as gpt-4/ChatGPT have the potential to reduce the human workload of the SR process while maintaining accuracy. We propose a new hybrid methodology that combines the strengths of LLMs and humans using the ability of LLMs to summarize large bodies of text autonomously and extract key information. This is then used by a researcher to make inclusion/exclusion decisions quickly. This process replaces the typical manually performed title/abstract screening, full-text screening, and data extraction steps in an SR while keeping a human in the loop for quality control. We developed a semi-automated LLM-assisted (Gemini-Pro) workflow with a novel innovative prompt development strategy. This involves extracting three categories of information including identifier, verifier, and data field (IVD) from the formatted documents. We present a case study where our hybrid approach reduced errors compared with a human-only SR. The hybrid workflow improved the accuracy of the case study by identifying 6/390 (1.53%) articles that were misclassified by the human-only process. It also matched the human-only decisions completely regarding the rest of the 384 articles. Given the rapid advances in LLM technology, these results will undoubtedly improve over time.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call