Tailoring Treatment in the Age of AI: A Systematic Review of Large Language Models in Personalized Healthcare

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon
Take notes icon Take Notes

Large Language Models (LLMs) are increasingly proposed to personalize healthcare delivery, yet their real-world readiness remains uncertain. We conducted a systematic literature review to assess how LLM-based systems are designed and used to enhance patient engagement and personalization, while identifying open challenges these tools pose. Four digital libraries (Scopus, IEEE Xplore, ACM, and Nature) were searched, yielding 3787 studies; 16 met the inclusion criteria. Most studies, published in 2024, span different types of motivations, architectures, limitations and privacy-preserving approaches. While LLMs show potential in automating patient data collection, recommendation/therapy generation, and continuous conversational support, their clinical reliability is limited. Most evaluations use synthetic or retrospective data, with only a few employing user studies or scalable simulation environments. This review highlights the tension between innovation and clinical applicability, emphasizing the need for robust evaluation protocols and human-in-the-loop systems to guide the safe and equitable deployment of LLMs in healthcare.

Similar Papers
  • Research Article
  • Cite Count Icon 20
  • 10.15288/jsad.2011.72.965
The Role of Academic Motivation in High School Students’ Current and Lifetime Alcohol Consumption: Adopting a Self-Determination Theory Perspective
  • Nov 1, 2011
  • Journal of Studies on Alcohol and Drugs
  • Stephanie V Wormington + 2 more

The current study investigated the relationship between different types of academic motives-specifically, intrinsic motivation, introjected regulation, and external regulation-and high school students' current and lifetime alcohol consumption. One thousand sixty-seven high school students completed measures of academic motivation, other school-related factors, and lifetime and current alcohol consumption. Using structural equation modeling, different types of motivation and school-related factors were differentially related to student drinking. Specifically, intrinsic motivation was negatively related to lifetime and current alcohol consumption. External regulation, on the other hand, was positively associated with current drinking. Grade point average was the only school-related factor related to student alcohol use. These findings suggest that motivation is an important construct to consider in predicting students' alcohol use, even when other more commonly studied educational variables are considered. In addition, it supports the adoption of a motivation framework that considers different types of motivation in understanding the relationship between academic motivation and alcohol use. Suggestions for incorporating the self-determination model of motivation into studies of alcohol and substance use, as well as potential impacts on intervention efforts, are discussed. In particular, it may be important to foster only certain types of motivation, rather than all types of academically-focused motives, in efforts to deter alcohol use.

  • Research Article
  • Cite Count Icon 5
  • 10.1080/10494820.2023.2248220
How does adaptive gamification impact different types of student motivation over time?
  • Aug 26, 2023
  • Interactive Learning Environments
  • S Dumas Reyssier + 5 more

The gamification approach is often used in educational settings, with widely varying results on learner motivation. A new trend emerged these last years on adaptive gamification to fit learners' preferences for game mechanics, but little is known on how the adaptation of different game elements impacts different types of learner motivation. In this paper, we propose to investigate in depth the effects of adaptive gamification on a continuum ranging from intrinsic motivation for knowledge to amotivation, by assigned game element. We conducted a field study involving 121 students (aged between 13 and 15 years old) from secondary schools during 4–6 weeks, to compare the impact of adapted game elements to randomly assigned ones. This approach allowed us to reveal the following findings: (1) the impact of gamification (either adapted or not) is different when considering each type of motivation, (2) the effects of the use of the gamified environment were only observed after five lessons, (3) the adaptation of the game elements seems to reinforce their effects on learners' motivation, and (4) each game element had specific effects on different types of motivation: while adapted Avatar and Timer had both some positive and negative effects, Progress had mainly detrimental ones.

  • Research Article
  • Cite Count Icon 8
  • 10.1287/ijds.2023.0007
How Can IJDS Authors, Reviewers, and Editors Use (and Misuse) Generative AI?
  • Apr 1, 2023
  • INFORMS Journal on Data Science
  • Galit Shmueli + 7 more

How Can <i>IJDS</i> Authors, Reviewers, and Editors Use (and Misuse) Generative AI?

  • Preprint Article
  • 10.2196/preprints.71916
Implementing Large Language Models in Health Care: Clinician-Focused Review With Interactive Guideline (Preprint)
  • Jan 29, 2025
  • Hongyi Li + 2 more

BACKGROUND Large language models (LLMs) can generate outputs understandable by humans, such as answers to medical questions and radiology reports. With the rapid development of LLMs, clinicians face a growing challenge in determining the most suitable algorithms to support their work. OBJECTIVE We aimed to provide clinicians and other health care practitioners with systematic guidance in selecting an LLM that is relevant and appropriate to their needs and facilitate the integration process of LLMs in health care. METHODS We conducted a literature search of full-text publications in English on clinical applications of LLMs published between January 1, 2022, and March 31, 2025, on PubMed, ScienceDirect, Scopus, and IEEE Xplore. We excluded papers from journals below a set citation threshold, as well as papers that did not focus on LLMs, were not research based, or did not involve clinical applications. We also conducted a literature search on arXiv within the same investigated period and included papers on the clinical applications of innovative multimodal LLMs. This led to a total of 270 studies. RESULTS We collected 330 LLMs and recorded their application frequency in clinical tasks and frequency of best performance in their context. On the basis of a 5-stage clinical workflow, we found that stages 2, 3, and 4 are key stages in the clinical workflow, involving numerous clinical subtasks and LLMs. However, the diversity of LLMs that may perform optimally in each context remains limited. GPT-3.5 and GPT-4 were the most versatile models in the 5-stage clinical workflow, applied to 52% (29/56) and 71% (40/56) of the clinical subtasks, respectively, and they performed best in 29% (16/56) and 54% (30/56) of the clinical subtasks, respectively. General-purpose LLMs may not perform well in specialized areas as they often require lightweight prompt engineering methods or fine-tuning techniques based on specific datasets to improve model performance. Most LLMs with multimodal abilities are closed-source models and, therefore, lack of transparency, model customization, and fine-tuning for specific clinical tasks and may also pose challenges regarding data protection and privacy, which are common requirements in clinical settings. CONCLUSIONS In this review, we found that LLMs may help clinicians in a variety of clinical tasks. However, we did not find evidence of generalist clinical LLMs successfully applicable to a wide range of clinical tasks. Therefore, their clinical deployment remains challenging. On the basis of this review, we propose an interactive online guideline for clinicians to select suitable LLMs by clinical task. With a clinical perspective and free of unnecessary technical jargon, this guideline may be used as a reference to successfully apply LLMs in clinical settings.

  • Research Article
  • Cite Count Icon 60
  • 10.1061/(asce)me.1943-5479.0000595
Impact of Safety Climate on Types of Safety Motivation and Performance: Multigroup Invariance Analysis
  • Jan 17, 2018
  • Journal of Management in Engineering
  • Huey Wen Lim + 3 more

Safety climate has a significant impact on safety motivation. Most prior studies focused on how motivated employees are in a unidimensional safety motivation scale, but they have overlooked why employees are motivated to work safely. The self-determination theory (SDT) is adopted in the present study to investigate how safety climate factors can predict different types of motivation (i.e., intrinsic, identified, introjected, external), consequently, affecting safety performance. There were a total of 392 respondents from questionnaire surveys that were undertaken in both the Chinese and Malaysian construction industries. Multigroup confirmatory factor analysis (MGCFA) and path analysis were performed and achieved an invariance model fit across samples. Safety competence and supportive environment were identified as the most important factors that predict intrinsic and identified motivation in the Chinese sample. On the other hand, safety commitment and safety communication were identified to predict intrinsic motivation in the Malaysian sample. In the Malaysian sample, intrinsic motivation predicted not only safety participation but also safety compliance. This was explained by their “self-leadership,” which exerted responsibility and autonomy to the employee’s own safety. The present study contributes to the body of knowledge by revealing the interplay mechanism of different types of safety motivation (multidimensional) in the relationship between safety climate and safety performance. The application of MGCFA and path analysis advances the existing literature by enabling the identification of safety climate factors that can predict types of safety motivation and safety performance. Moreover, the comparison of results between two countries extends the body of research on safety motivation with a cross-cultural perspective. These findings provide practitioners with a guideline to make a more precise assessment of employees’ types of motivation based on which measures can be taken to improve safety management and safety performance, either in domestic or multicultural work teams.

  • Research Article
  • Cite Count Icon 4
  • 10.2196/72062
Large Language Models in Medical Diagnostics: Scoping Review With Bibliometric Analysis.
  • Jun 9, 2025
  • Journal of medical Internet research
  • Hankun Su + 14 more

The integration of large language models (LLMs) into medical diagnostics has garnered substantial attention due to their potential to enhance diagnostic accuracy, streamline clinical workflows, and address health care disparities. However, the rapid evolution of LLM research necessitates a comprehensive synthesis of their applications, challenges, and future directions. This scoping review aimed to provide an overview of the current state of research regarding the use of LLMs in medical diagnostics. The study sought to answer four primary subquestions, as follows: (1) Which LLMs are commonly used? (2) How are LLMs assessed in diagnosis? (3) What is the current performance of LLMs in diagnosing diseases? (4) Which medical domains are investigating the application of LLMs? This scoping review was conducted according to the Joanna Briggs Institute Manual for Evidence Synthesis and adheres to the PRISMA-ScR (Preferred Reporting Items for Systematic Reviews and Meta-Analyses extension for Scoping Reviews). Relevant literature was searched from the Web of Science, PubMed, Embase, IEEE Xplore, and ACM Digital Library databases from 2022 to 2025. Articles were screened and selected based on predefined inclusion and exclusion criteria. Bibliometric analysis was performed using VOSviewer to identify major research clusters and trends. Data extraction included details on LLM types, application domains, and performance metrics. The field is rapidly expanding, with a surge in publications after 2023. GPT-4 and its variants dominated research (70/95, 74% of studies), followed by GPT-3.5 (34/95, 36%). Key applications included disease classification (text or image-based), medical question answering, and diagnostic content generation. LLMs demonstrated high accuracy in specialties like radiology, psychiatry, and neurology but exhibited biases in race, gender, and cost predictions. Ethical concerns, including privacy risks and model hallucination, alongside regulatory fragmentation, were critical barriers to clinical adoption. LLMs hold transformative potential for medical diagnostics but require rigorous validation, bias mitigation, and multimodal integration to address real-world complexities. Future research should prioritize explainable artificial intelligence frameworks, specialty-specific optimization, and international regulatory harmonization to ensure equitable and safe clinical deployment.

  • Preprint Article
  • 10.2196/preprints.72062
Large Language Models in Medical Diagnostics: Scoping Review With Bibliometric Analysis (Preprint)
  • Feb 2, 2025
  • Hankun Su + 14 more

BACKGROUND The integration of large language models (LLMs) into medical diagnostics has garnered substantial attention due to their potential to enhance diagnostic accuracy, streamline clinical workflows, and address health care disparities. However, the rapid evolution of LLM research necessitates a comprehensive synthesis of their applications, challenges, and future directions. OBJECTIVE This scoping review aimed to provide an overview of the current state of research regarding the use of LLMs in medical diagnostics. The study sought to answer four primary subquestions, as follows: (1) Which LLMs are commonly used? (2) How are LLMs assessed in diagnosis? (3) What is the current performance of LLMs in diagnosing diseases? (4) Which medical domains are investigating the application of LLMs? METHODS This scoping review was conducted according to the Joanna Briggs Institute Manual for Evidence Synthesis and adheres to the PRISMA-ScR (Preferred Reporting Items for Systematic Reviews and Meta-Analyses extension for Scoping Reviews). Relevant literature was searched from the Web of Science, PubMed, Embase, IEEE Xplore, and ACM Digital Library databases from 2022 to 2025. Articles were screened and selected based on predefined inclusion and exclusion criteria. Bibliometric analysis was performed using VOSviewer to identify major research clusters and trends. Data extraction included details on LLM types, application domains, and performance metrics. RESULTS The field is rapidly expanding, with a surge in publications after 2023. GPT-4 and its variants dominated research (70/95, 74% of studies), followed by GPT-3.5 (34/95, 36%). Key applications included disease classification (text or image-based), medical question answering, and diagnostic content generation. LLMs demonstrated high accuracy in specialties like radiology, psychiatry, and neurology but exhibited biases in race, gender, and cost predictions. Ethical concerns, including privacy risks and model hallucination, alongside regulatory fragmentation, were critical barriers to clinical adoption. CONCLUSIONS LLMs hold transformative potential for medical diagnostics but require rigorous validation, bias mitigation, and multimodal integration to address real-world complexities. Future research should prioritize explainable artificial intelligence frameworks, specialty-specific optimization, and international regulatory harmonization to ensure equitable and safe clinical deployment.

  • Research Article
  • 10.1108/ijem-02-2023-0080
The role of basic psychological needs satisfaction (BPNS) during the initial use of online teaching platforms on faculty members’ continuance intention
  • Mar 12, 2024
  • International Journal of Educational Management
  • Arash Kamali + 2 more

PurposeBased on self-determination theory (SDT), this study aims to investigate the motivational antecedents of faculty members’ continuance intention of using online teaching platforms. For this purpose, we introduced a model incorporating basic psychological needs satisfaction (BPNS) and different motivational mechanisms.Design/methodology/approachUsing a survey study of 312 faculty members, we examined the model by structural equation modeling (SEM).FindingsThe SEM results revealed a positive correlation between BPNS and continuance intention. Additionally, we illustrate the importance of different types of extrinsic motivation. By presenting an alternative model, we demonstrate that the initial-use-identified regulation (one type of extrinsic motivation) has an association with continuance intention (CI). However, this association loses significance if BPNS is present within the model. Moreover, we determined that there is no significant relationship between initial-use external regulation (another type of extrinsic motivation) and faculty members' CI for online teaching. Lastly, the results revealed that pre-use amotivation and intrinsic motivation impact CI through initial-use BPNS.Research limitations/implicationsThe results suggest that decision-makers at educational institutions should consider that extrinsic motivation has different types with different impacts and that BPNS has a vital role in faculty members’ intention to continue using online teaching platforms.Originality/valueThis study is novel because it reveals some details of extrinsic motivation effects by offering a model that combines BPNS and different types of motivation in two stages. It is important and rare that we concentrate on the almost neglected issue of faculty members’ motivational perspectives in online teaching, while the literature mainly focuses on students’ perspectives.

  • Research Article
  • Cite Count Icon 5
  • 10.1186/s40594-024-00502-6
One size doesn’t fit all: how different types of learning motivations influence engineering undergraduate students’ success outcomes
  • Aug 28, 2024
  • International Journal of STEM Education
  • Xi Wang + 2 more

BackgroundMotivation is the inherent belief to guide students learning goals and behaviors to make continuous efforts and strengthen learning outcomes. Previous research reported the positive impacts of learning motivation on student success, but there have been limited efforts in systematically and structurally studying different types of motivations and their impacts on students’ success in engineering education. The current study contributes to the literature by systematically examining two important types of motivations and their influences on undergraduate engineering students in a theoretically grounded manner while using an advanced analytical approach.MethodsThe current study conducted a cross-sectional survey with undergraduate engineering students (n = 514) from 18 different schools across nine U.S. states. The survey assessed students’ self-report scores on six types of motivations to study developed based on formative research and the current literature and then collected students’ self-reported learning outcomes, current GPA, university satisfaction, engineering program satisfaction, and individual demographic factors. The data were then analyzed using structural equation modeling.ResultsThe results showed that motivations related to family, personality, and academic expectations were consistently positively associated with all measured students’ success outcomes; motivations related to educators were associated with all four outcomes but student GPA; motivations related to course contents were associated with learning outcomes and student GPA; and motivations related to peers did not predict any of the four measured students’ success outcomes.DiscussionWe explain some of the unexpected results with further literature that examines engineering culture and ecology. We also make recommendations related to cognitive training, tailored engineering education, peer culture interventions, and family orientation programs.

  • Conference Article
  • Cite Count Icon 3
  • 10.1115/detc2015-47625
Considering Different Motivations in Design for Consumer-Behavior Change
  • Aug 2, 2015
  • Jayesh Srivastava + 1 more

Much existing work aims to understand how to change human behavior through product-design interventions. Given the diversity of individuals and their motivations, solutions that address different motives are surprisingly rare. We aim to develop and validate a framework that clearly identifies and targets different types of behavioral motives in users. We present a behavior model comprising egoistic, sociocultural and altruistic motives, and apply the model to sustainable behavior. We confirmed the explanatory power of the behavior model by categorizing user comments about an international environmental agreement from multiple news sources. We next developed concepts, each intended to target a single motive type, and elicited evaluations from online respondents who self-assessed their motivation type after evaluating the concepts. We present and discuss correlation results between motive types and preference for products that target these types for two iterations of the experiment. Deviations from our expected results are mainly due to unexpected perceptions, both positive and negative, of our concepts. Despite this, the main value of this work lies in the explicit consideration of a manageable number of different types of motives. A proposed design tool incorporates the three types of motives from the model with the different levels of persuasion others have proposed to change user behavior.

  • Research Article
  • Cite Count Icon 2
  • 10.2196/70535
Unveiling the Potential of Large Language Models in Transforming Chronic Disease Management: Mixed Methods Systematic Review.
  • Apr 16, 2025
  • Journal of medical Internet research
  • Caixia Li + 7 more

Chronic diseases are a major global health burden, accounting for nearly three-quarters of the deaths worldwide. Large language models (LLMs) are advanced artificial intelligence systems with transformative potential to optimize chronic disease management; however, robust evidence is lacking. This review aims to synthesize evidence on the feasibility, opportunities, and challenges of LLMs across the disease management spectrum, from prevention to screening, diagnosis, treatment, and long-term care. Following the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analysis) guidelines, 11 databases (Cochrane Central Register of Controlled Trials, CINAHL, Embase, IEEE Xplore, MEDLINE via Ovid, ProQuest Health & Medicine Collection, ScienceDirect, Scopus, Web of Science Core Collection, China National Knowledge Internet, and SinoMed) were searched on April 17, 2024. Intervention and simulation studies that examined LLMs in the management of chronic diseases were included. The methodological quality of the included studies was evaluated using a rating rubric designed for simulation-based research and the risk of bias in nonrandomized studies of interventions tool for quasi-experimental studies. Narrative analysis with descriptive figures was used to synthesize the study findings. Random-effects meta-analyses were conducted to assess the pooled effect estimates of the feasibility of LLMs in chronic disease management. A total of 20 studies examined general-purpose (n=17) and retrieval-augmented generation-enhanced LLMs (n=3) for the management of chronic diseases, including cancer, cardiovascular diseases, and metabolic disorders. LLMs demonstrated feasibility across the chronic disease management spectrum by generating relevant, comprehensible, and accurate health recommendations (pooled accurate rate 71%, 95% CI 0.59-0.83; I2=88.32%) with retrieval-augmented generation-enhanced LLMs having higher accuracy rates compared to general-purpose LLMs (odds ratio 2.89, 95% CI 1.83-4.58; I2=54.45%). LLMs facilitated equitable information access; increased patient awareness regarding ailments, preventive measures, and treatment options; and promoted self-management behaviors in lifestyle modification and symptom coping. Additionally, LLMs facilitate compassionate emotional support, social connections, and health care resources to improve the health outcomes of chronic diseases. However, LLMs face challenges in addressing privacy, language, and cultural issues; undertaking advanced tasks, including diagnosis, medication, and comorbidity management; and generating personalized regimens with real-time adjustments and multiple modalities. LLMs have demonstrated the potential to transform chronic disease management at the individual, social, and health care levels; however, their direct application in clinical settings is still in its infancy. A multifaceted approach that incorporates robust data security, domain-specific model fine-tuning, multimodal data integration, and wearables is crucial for the evolution of LLMs into invaluable adjuncts for health care professionals to transform chronic disease management. PROSPERO CRD42024545412; https://www.crd.york.ac.uk/PROSPERO/view/CRD42024545412.

  • Research Article
  • 10.2196/76326
Large Language Models in Critical Care Medicine: Scoping Review.
  • Nov 24, 2025
  • JMIR medical informatics
  • Tongyue Shi + 9 more

With the rapid development of artificial intelligence, large language models (LLMs) have shown strong capabilities in natural language understanding, reasoning, and generation, attracting much research interest in applying LLMs to health and medicine. Critical care medicine (CCM) provides diagnosis and treatment for patients with critical illness who often require intensive monitoring and interventions in intensive care units (ICUs). Whether LLMs can be applied to CCM, and whether they can operate as ICU experts in assisting clinical decision-making rather than "stochastic parrots," remains uncertain. This scoping review aims to provide a panoramic portrait of the application of LLMs in CCM, identifying the advantages, challenges, and future potential of LLMs in this field. This study was conducted in accordance with the PRISMA-ScR (Preferred Reporting Items for Systematic Reviews and Meta-Analyses extension for Scoping Reviews) guidelines. Literature was searched across 7 databases, including PubMed, Embase, Scopus, Web of Science, CINAHL, IEEE Xplore, and ACM Digital Library, from the first available paper to August 22, 2025. From an initial 2342 retrieved papers, 41 were selected for final review. LLMs played an important role in CCM through the following 3 main channels: clinical decision support, medical documentation and reporting, and medical education and doctor-patient communication. Compared to traditional artificial intelligence models, LLMs have advantages in handling unstructured data and do not require manual feature engineering. Meanwhile, applying LLMs to CCM has faced challenges, including hallucinations and poor interpretability, sensitivity to prompts, bias and alignment challenges, and privacy and ethical issues. Although LLMs are not yet ICU experts, they have the potential to become valuable tools in CCM, helping to improve patient outcomes and optimize health care delivery. Future research should enhance model reliability and interpretability, improve model training and deployment scalability, integrate up-to-date medical knowledge, and strengthen privacy and ethical guidelines, paving the way for LLMs to fully realize their impact in critical care. OSF Registries yn328; https://osf.io/yn328/.

  • Research Article
  • Cite Count Icon 3
  • 10.1016/j.psychsport.2016.06.002
On the formation of favourable impressions: Associations between self-presentation motives, task behaviour, and others’ evaluations of the self in a team-sport setting
  • Jun 3, 2016
  • Psychology of Sport and Exercise
  • Timothy C Howle + 2 more

On the formation of favourable impressions: Associations between self-presentation motives, task behaviour, and others’ evaluations of the self in a team-sport setting

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 1
  • 10.1051/shsconf/202111701004
The role of educational motivation in the creativity of intellectually gifted primary schoolchildren
  • Jan 1, 2021
  • SHS Web of Conferences
  • Natalia B Shumakova + 1 more

The importance of external and internal motivation in the academic success of students and their persistence in achieving educational and cognitive goals is considered within the framework of the theory of self-determination. The role of different types of educational motivation in the development of the creative potential of students remains insufficiently studied, especially in relation to primary schoolchildren. The research objective is to clarify the role of different motives for learning activity through differences in the figurative and verbal creativity of intellectually gifted primary schoolchildren on the threshold of adolescence. The study involved 96 intellectually gifted primary schoolchildren of the 3rd-4th grades (the average age is 9.6; the number of boys and girls is the same). The internal and external motivation of their educational activity was studied using the “Scale of educational motivation” developed by T.O. Gordeeva based on the Academic Self-Regulation Questionnaire (SRQ-A) by Ryan and Connell. Divergent creativity was analyzed using N.B. Shumakova’s methodology “Figurative and verbal creativity”. An ambiguous relationship has been revealed between different types of motivation and indicators of verbal and imaginative creativity. Internal cognitive motivation and self-development are reliably and directly related to imaginative fluency (r=0.28 and r=0.24), while external motivation (parental regulation) and imaginative creativity (including imaginative originality and elaboration) are linked reversely (r=-0.31, r=-0.24). The regression analysis has demonstrated that external motivation (learning for the sake of fulfilling the parents’ requirements) at primary school age is a negative predictor of imaginative creativity and originality of intellectually gifted students in their adolescence (F=6.91, β=-0.321, p=0.01 and F=6.57, β=-0.314, p=0.01).

  • Research Article
  • Cite Count Icon 25
  • 10.1186/s12911-025-02954-4
A systematic review of large language model (LLM) evaluations in clinical medicine
  • Mar 7, 2025
  • BMC Medical Informatics and Decision Making
  • Sina Shool + 5 more

BackgroundLarge Language Models (LLMs), advanced AI tools based on transformer architectures, demonstrate significant potential in clinical medicine by enhancing decision support, diagnostics, and medical education. However, their integration into clinical workflows requires rigorous evaluation to ensure reliability, safety, and ethical alignment.ObjectiveThis systematic review examines the evaluation parameters and methodologies applied to LLMs in clinical medicine, highlighting their capabilities, limitations, and application trends.MethodsA comprehensive review of the literature was conducted across PubMed, Scopus, Web of Science, IEEE Xplore, and arXiv databases, encompassing both peer-reviewed and preprint studies. Studies were screened against predefined inclusion and exclusion criteria to identify original research evaluating LLM performance in medical contexts.ResultsThe results reveal a growing interest in leveraging LLM tools in clinical settings, with 761 studies meeting the inclusion criteria. While general-domain LLMs, particularly ChatGPT and GPT-4, dominated evaluations (93.55%), medical-domain LLMs accounted for only 6.45%. Accuracy emerged as the most commonly assessed parameter (21.78%). Despite these advancements, the evidence base highlights certain limitations and biases across the included studies, emphasizing the need for careful interpretation and robust evaluation frameworks.ConclusionsThe exponential growth in LLM research underscores their transformative potential in healthcare. However, addressing challenges such as ethical risks, evaluation variability, and underrepresentation of critical specialties will be essential. Future efforts should prioritize standardized frameworks to ensure safe, effective, and equitable LLM integration in clinical practice.

Save Icon
Up Arrow
Open/Close
  • Ask R Discovery Star icon
  • Chat PDF Star icon

AI summaries and top papers from 250M+ research sources.

Search IconWhat is the difference between bacteria and viruses?
Open In New Tab Icon
Search IconWhat is the function of the immune system?
Open In New Tab Icon
Search IconCan diabetes be passed down from one generation to the next?
Open In New Tab Icon