Biomedical Information Retrieval Research Articles

BackgroundMining the vast pool of biomedical literature to extract accurate responses and relevant references is challenging due to the domain's interdisciplinary nature, specialized jargon, and continuous evolution. Early natural language processing (NLP) approaches often led to incorrect answers as they failed to comprehend the nuances of natural language. However, transformer models have significantly advanced the field by enabling the creation of large language models (LLMs), enhancing question-answering (QA) tasks. Despite these advances, current LLM-based solutions for specialized domains like biology and biomedicine still struggle to generate up-to-date responses while avoiding “hallucination” or generating plausible but factually incorrect responses.ResultsOur work focuses on enhancing prompts using a retrieval-augmented architecture to guide LLMs in generating meaningful responses for biomedical QA tasks. We evaluated two approaches: one relying on text embedding and vector similarity in a high-dimensional space, and our proposed method, which uses explicit signals in user queries to extract meaningful contexts. For robust evaluation, we tested these methods on 50 specific and challenging questions from diverse biomedical topics, comparing their performance against a baseline model, BM25. Retrieval performance of our method was significantly better than others, achieving a median Precision@10 of 0.95, which indicates the fraction of the top 10 retrieved chunks that are relevant. We used GPT-4, OpenAI's most advanced LLM to maximize the answer quality and manually accessed LLM-generated responses. Our method achieved a median answer quality score of 2.5, surpassing both the baseline model and the text embedding-based approach. We developed a QA bot, WeiseEule (https://github.com/wasimaftab/WeiseEule-LocalHost), which utilizes these methods for comparative analysis and also offers advanced features for review writing and identifying relevant articles for citation.ConclusionsOur findings highlight the importance of prompt enhancement methods that utilize explicit signals in user queries over traditional text embedding-based approaches to improve LLM-generated responses for specialized queries in specialized domains such as biology and biomedicine. By providing users complete control over the information fed into the LLM, our approach addresses some of the major drawbacks of existing web-based chatbots and LLM-based QA systems, including hallucinations and the generation of irrelevant or outdated responses.

Read full abstract

BackgroundWith the continuous expansion of available biomedical data, efficient and effective information retrieval has become of utmost importance. Semantic expansion of queries using synonyms may improve information retrieval.ObjectiveThe aim of this study was to automatically construct and evaluate expanded PubMed queries of the form “preferred term”[MH] OR “preferred term”[TIAB] OR “synonym 1”[TIAB] OR “synonym 2”[TIAB] OR …, for each of the 28,313 Medical Subject Heading (MeSH) descriptors, by using different semantic expansion strategies. We sought to propose an innovative method that could automatically evaluate these strategies, based on the three main metrics used in information science (precision, recall, and F-measure).MethodsThree semantic expansion strategies were assessed. They differed by the synonyms used to build the queries as follows: MeSH synonyms, Unified Medical Language System (UMLS) mappings, and custom mappings (Catalogue et Index des Sites Médicaux de langue Française [CISMeF]). The precision, recall, and F-measure metrics were automatically computed for the three strategies and for the standard automatic term mapping (ATM) of PubMed. The method to automatically compute the metrics involved computing the number of all relevant citations (A), using National Library of Medicine indexing as the gold standard (“preferred term”[MH]), the number of citations retrieved by the added terms (”synonym 1“[TIAB] OR ”synonym 2“[TIAB] OR …) (B), and the number of relevant citations retrieved by the added terms (combining the previous two queries with an “AND” operator) (C). It was possible to programmatically compute the metrics for each strategy using each of the 28,313 MeSH descriptors as a “preferred term,” corresponding to 239,724 different queries built and sent to the PubMed application program interface. The four search strategies were ranked and compared for each metric.ResultsATM had the worst performance for all three metrics among the four strategies. The MeSH strategy had the best mean precision (51%, SD 23%). The UMLS strategy had the best recall and F-measure (41%, SD 31% and 36%, SD 24%, respectively). CISMeF had the second best recall and F-measure (40%, SD 31% and 35%, SD 24%, respectively). However, considering a cutoff of 5%, CISMeF had better precision than UMLS for 1180 descriptors, better recall for 793 descriptors, and better F-measure for 678 descriptors.ConclusionsThis study highlights the importance of using semantic expansion strategies to improve information retrieval. However, the performances of a given strategy, relatively to another, varied greatly depending on the MeSH descriptor. These results confirm there is no ideal search strategy for all descriptors. Different semantic expansions should be used depending on the descriptor and the user’s objectives. Thus, we developed an interface that allows users to input a descriptor and then proposes the best semantic expansion to maximize the three main metrics (precision, recall, and F-measure).

Read full abstract

Biomedical Information Retrieval Research Articles

Related Topics

Articles published on Biomedical Information Retrieval

Biomedical Information Retrieval with Positive-Unlabeled Learning and Knowledge Graphs

Optimizing biomedical information retrieval with a keyword frequency-driven prompt enhancement strategy

Comparative Evaluation of Pre-Trained Language Models for Biomedical Information Retrieval.

Deep Learning-Based Surgical Treatment Recommendation and Nonsurgical Prognosis Status Classification for Scaphoid Fractures by Automated X-ray Image Recognition.

Bridging the gap in biomedical information retrieval: Harnessing machine learning for enhanced search results and query semantics

Survey on Recommender Systems for Biomedical Items in Life and Health Sciences

BIR: Biomedical Information Retrieval System for Cancer Treatment in Electronic Health Record Using Transformers

Opportunities and challenges for ChatGPT and large language models in biomedicine and health.

MedCPT: Contrastive Pre-trained Transformers with large-scale PubMed search logs for zero-shot biomedical information retrieval.

Improving the robustness and stability of a machine learning model for breast cancer prognosis through the use of multi-modal classifiers

MedGraph: A semantic biomedical information retrieval framework using knowledge graph embedding for PubMed.

An Efficient Method for Biomedical Entity Linking Based on Inter- and Intra-Entity Attention

Effective Natural Language Processing and Interpretable Machine Learning for Structuring CT Liver-Tumor Reports

Examining the Effect of the Ratio of Biomedical Domain to General Domain Data in Corpus in Biomedical Literature Mining

Use of web-based health information portals in primary health care: Experience from a rural Primary Health Centre in Haryana.

Efficacy and Safety of Endoscopic Resection and Open Surgery for Treating Thyroid Diseases: A Meta-Analysis

MeSHProbeNet-P

A2A: a platform for research in biomedical literature search

MeSH-Based Semantic Indexing Approach to Enhance Biomedical Information Retrieval

Identification of the Best Semantic Expansion to Query PubMed Through Automatic Performance Assessment of Four Search Strategies on All Medical Subject Heading Descriptors: Comparative Study.

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Biomedical Information Retrieval Research Articles

Related Topics

Articles published on Biomedical Information Retrieval

Biomedical Information Retrieval with Positive-Unlabeled Learning and Knowledge Graphs

Optimizing biomedical information retrieval with a keyword frequency-driven prompt enhancement strategy

Comparative Evaluation of Pre-Trained Language Models for Biomedical Information Retrieval.

Deep Learning-Based Surgical Treatment Recommendation and Nonsurgical Prognosis Status Classification for Scaphoid Fractures by Automated X-ray Image Recognition.

Bridging the gap in biomedical information retrieval: Harnessing machine learning for enhanced search results and query semantics

Survey on Recommender Systems for Biomedical Items in Life and Health Sciences

BIR: Biomedical Information Retrieval System for Cancer Treatment in Electronic Health Record Using Transformers

Opportunities and challenges for ChatGPT and large language models in biomedicine and health.

MedCPT: Contrastive Pre-trained Transformers with large-scale PubMed search logs for zero-shot biomedical information retrieval.

Improving the robustness and stability of a machine learning model for breast cancer prognosis through the use of multi-modal classifiers

MedGraph: A semantic biomedical information retrieval framework using knowledge graph embedding for PubMed.

An Efficient Method for Biomedical Entity Linking Based on Inter- and Intra-Entity Attention

Effective Natural Language Processing and Interpretable Machine Learning for Structuring CT Liver-Tumor Reports

Examining the Effect of the Ratio of Biomedical Domain to General Domain Data in Corpus in Biomedical Literature Mining

Use of web-based health information portals in primary health care: Experience from a rural Primary Health Centre in Haryana.

Efficacy and Safety of Endoscopic Resection and Open Surgery for Treating Thyroid Diseases: A Meta-Analysis

MeSHProbeNet-P

A2A: a platform for research in biomedical literature search

MeSH-Based Semantic Indexing Approach to Enhance Biomedical Information Retrieval

Identification of the Best Semantic Expansion to Query PubMed Through Automatic Performance Assessment of Four Search Strategies on All Medical Subject Heading Descriptors: Comparative Study.