Health Care Decisions Research Articles

BackgroundSystematic reviews (SRs) are used to inform clinical practice guidelines and healthcare decision making by synthesising the results of primary studies. Efficiently retrieving as many relevant SRs as possible is challenging with a minimum number of databases, as there is currently no guidance on how to do this optimally. In a previous study, we determined which individual databases contain the most SRs, and which combination of databases retrieved the most SRs. In this study, we aimed to validate those previous results by using a different, larger, and more recent set of SRs.MethodsWe obtained a set of 100 Overviews of Reviews that included a total of 2276 SRs. SR inclusion was assessed in MEDLINE, Embase, and Epistemonikos. The mean inclusion rates (% of included SRs) and corresponding 95% confidence intervals were calculated for each database individually, as well as for combinations of MEDLINE with each other database and reference checking. Features of SRs not identified by the best database combination were reviewed qualitatively.ResultsInclusion rates of SRs were similar in all three databases (mean inclusion rates in % with 95% confidence intervals: 94.3 [93.9–94.8] for MEDLINE, 94.4 [94.0-94.9] for Embase, and 94.4 [93.9–94.9] for Epistemonikos). Adding reference checking to MEDLINE increased the inclusion rate to 95.5 [95.1–96.0]. The best combination of two databases plus reference checking consisted of MEDLINE and Epistemonikos (98.1 [97.7–98.5]). Among the 44/2276 SRs not identified by this combination, 34 were published in journals from China, four were other journal publications, three were health agency reports, two were dissertations, and one was a preprint. When discounting the journal publications from China, the SR inclusion rate in the recommended combination (MEDLINE, Epistemonikos and reference checking) was even higher than in the previous study (99.6 vs. 99.2%).ConclusionsA combination of databases and reference checking was the best approach to searching for biomedical SRs. MEDLINE and Epistemonikos, complemented by checking the references of the included studies, was the most efficient and produced the highest recall. However, our results point to the presence of geographical bias, because some publications in journals from China were not identified.Study registrationhttps://doi.org/10.17605/OSF.IO/R5EAS (Open Science Framework).

Read full abstract

Recent surveys indicate that 48% of consumers actively use generative artificial intelligence (AI) for health-related inquiries. Despite widespread adoption and the potential to improve health care access, scant research examines the performance of AI chatbot responses regarding emergency care advice. We assessed the quality of AI chatbot responses to common emergency care questions. We sought to determine qualitative differences in responses from 4 free-access AI chatbots, for 10 different serious and benign emergency conditions. We created 10 emergency care questions that we fed into the free-access versions of ChatGPT 3.5 (OpenAI), Google Bard, Bing AI Chat (Microsoft), and Claude AI (Anthropic) on November 26, 2023. Each response was graded by 5 board-certified emergency medicine (EM) faculty for 8 domains of percentage accuracy, presence of dangerous information, factual accuracy, clarity, completeness, understandability, source reliability, and source relevancy. We determined the correct, complete response to the 10 questions from reputable and scholarly emergency medical references. These were compiled by an EM resident physician. For the readability of the chatbot responses, we used the Flesch-Kincaid Grade Level of each response from readability statistics embedded in Microsoft Word. Differences between chatbots were determined by the chi-square test. Each of the 4 chatbots' responses to the 10 clinical questions were scored across 8 domains by 5 EM faculty, for 400 assessments for each chatbot. Together, the 4 chatbots had the best performance in clarity and understandability (both 85%), intermediate performance in accuracy and completeness (both 50%), and poor performance (10%) for source relevance and reliability (mostly unreported). Chatbots contained dangerous information in 5% to 35% of responses, with no statistical difference between chatbots on this metric (P=.24). ChatGPT, Google Bard, and Claud AI had similar performances across 6 out of 8 domains. Only Bing AI performed better with more identified or relevant sources (40%; the others had 0%-10%). Flesch-Kincaid Reading level was 7.7-8.9 grade for all chatbots, except ChatGPT at 10.8, which were all too advanced for average emergency patients. Responses included both dangerous (eg, starting cardiopulmonary resuscitation with no pulse check) and generally inappropriate advice (eg, loosening the collar to improve breathing without evidence of airway compromise). AI chatbots, though ubiquitous, have significant deficiencies in EM patient advice, despite relatively consistent performance. Information for when to seek urgent or emergent care is frequently incomplete and inaccurate, and patients may be unaware of misinformation. Sources are not generally provided. Patients who use AI to guide health care decisions assume potential risks. AI chatbots for health should be subject to further research, refinement, and regulation. We strongly recommend proper medical consultation to prevent potential adverse outcomes.

Read full abstract

Health Care Decisions Research Articles

Related Topics

Articles published on Health Care Decisions

Health Diagnostic Assistant using LLMs

Hospital Length-of-Stay Prediction Using Machine Learning Algorithms—A Literature Review

Challenges associated with access to recently developed hemophilia treatments in routine care: perspectives of healthcare professionals.

Widely Integrated Services in Home (WISH) for homebound older adults: a study protocol for a randomized encouragement trial.

Recounting the untold stories of breast cancer patient experiences: lessons learned from a patient-public involvement and engagement storytelling event.

Indirect Treatment Comparisons in Healthcare Decision Making: A Targeted Review of Regulatory Approval, Reimbursement, and Pricing Recommendations Globally for Oncology Drugs in 2021-2023.

Identifying and prioritizing inefficiency causes in Iran's health system.

What are the priorities of consumers and carers regarding measurement for evaluation in mental healthcare? Results from a Q-methodology study.

Ensuring AI Algorithm Fairness in Healthcare Decision-Making

The optimal approach for retrieving systematic reviews was achieved when searching MEDLINE and Epistemonikos in addition to reference checking: a methodological validation study

Potential of ChatGPT Schematics to Enhance Patient Understanding and Healthcare Decision-Making

Cost-effectiveness of radiofrequency echographic multi-spectrometry (REMS) for the diagnosis of osteoporosis in the United States

Edge computing-based ensemble learning model for health care decision systems

A thematic analysis of shared decision-making in consultations with patients with a presumed brain tumor and neurosurgeons

Mental healthcare and pragmatic shared decision-making in general practice: An interview study.

Norwegian and Swedish value sets for the EORTC QLU-C10D utility instrument.

Developing and Evaluating SEE-Diabetes: A Patient-Centered Educational Decision Support System for Diabetes Car.

Accuracy of Prospective Assessments of 4 Large Language Model Chatbot Responses to Patient Questions About Emergency Care: Experimental Comparative Study.

Investigation of reporting bias in interrupted time series (ITS) studies: a study protocol

The human rights act and the community nurse.

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Health Care Decisions Research Articles

Related Topics

Articles published on Health Care Decisions

Health Diagnostic Assistant using LLMs

Hospital Length-of-Stay Prediction Using Machine Learning Algorithms—A Literature Review

Challenges associated with access to recently developed hemophilia treatments in routine care: perspectives of healthcare professionals.

Widely Integrated Services in Home (WISH) for homebound older adults: a study protocol for a randomized encouragement trial.

Recounting the untold stories of breast cancer patient experiences: lessons learned from a patient-public involvement and engagement storytelling event.

Indirect Treatment Comparisons in Healthcare Decision Making: A Targeted Review of Regulatory Approval, Reimbursement, and Pricing Recommendations Globally for Oncology Drugs in 2021-2023.

Identifying and prioritizing inefficiency causes in Iran's health system.

What are the priorities of consumers and carers regarding measurement for evaluation in mental healthcare? Results from a Q-methodology study.

Ensuring AI Algorithm Fairness in Healthcare Decision-Making

The optimal approach for retrieving systematic reviews was achieved when searching MEDLINE and Epistemonikos in addition to reference checking: a methodological validation study

Potential of ChatGPT Schematics to Enhance Patient Understanding and Healthcare Decision-Making

Cost-effectiveness of radiofrequency echographic multi-spectrometry (REMS) for the diagnosis of osteoporosis in the United States

Edge computing-based ensemble learning model for health care decision systems

A thematic analysis of shared decision-making in consultations with patients with a presumed brain tumor and neurosurgeons

Mental healthcare and pragmatic shared decision-making in general practice: An interview study.

Norwegian and Swedish value sets for the EORTC QLU-C10D utility instrument.

Developing and Evaluating SEE-Diabetes: A Patient-Centered Educational Decision Support System for Diabetes Car.

Accuracy of Prospective Assessments of 4 Large Language Model Chatbot Responses to Patient Questions About Emergency Care: Experimental Comparative Study.

Investigation of reporting bias in interrupted time series (ITS) studies: a study protocol

The human rights act and the community nurse.