Abstract

Abstract Background and Aims The rapidly growing scientific literature poses a significant challenge for researchers seeking to distill key insights. We utilized Retrieval-Augmented Generation (RAG), a novel AI-driven approach, to efficiently process and extract meaningful information from published literature on uremic toxins. RAG is a general AI framework for improving the quality of responses generated by Large Language Models (LLMs) by supplementing the LLM's internal representation of information with curated expert knowledge. Method First, we collected on PubMed all abstracts related to the topic of “uremic toxins” through Metapub, a Python library designed to facilitate fetching metadata from PubMed. Second, we set up a RAG system that comprises 2 steps. In a retrieval step, the questions on topic (“uremic toxins”) and the documents (=all collected abstracts and manuscripts) are encoded into vectors (i.e., high-dimensional numerical representations). Similarity measures are used to find the best matches between documents and the questions on topic. Second, in the augmented generation step, the LLM (e.g., ChatGPT) uses these best matches of documents to generate a coherent and informed response. Results We collected 3497 abstracts from the PubMed and 191 expert-curated publications in PDF format related to the topic “uremic toxin”. These 191 publications were broken down to 5756 documents, each with a manageable size of text. The final vector database comprised 9253 vectors. Using RAG, we requested responses from the LLM on multiple questions related to “uremic toxins”. Some examples are shown in Table 1. The first and second responses given by the LLM are reasonable. However, the third answer shows the phenomenon of ‘hallucination’—where models generate plausible and convincingly sounding yet factually incorrect information. Conclusion The use of RAG improves the capability of LLMs to answer questions by leveraging the information contained within curated abstracts and publications. Despite the improvements with RAG, the phenomenon of ‘hallucination’ persists. A concerning feature of hallucinations is their eloquent and convincing language. For the time being, LLM output—even when improved with RAG—requires scrutiny and human verification.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.