Accelerate Literature Icon
Want to do a literature review? Try our new Literature Review workflow

Accessibility Issue Detection and Repair in Mobile Applications: A Systematic Literature Review

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon

Mobile application accessibility is crucial for digital inclusion. This paper presents a systematic literature review (SLR) following PRISMA guidelines, synthesizing advancements in accessibility issue detection and repair techniques. We analyze representative mobile accessibility tools and industrial practices to complement SLR evidence. Nine major databases (ACM DL, IEEE Xplore, ScienceDirect, SpringerLink, Wiley, Web of Science, Scopus, CNKI, arXiv) were searched, with 76 high-quality studies published up to July 2025 analyzed. Common accessibility issues are categorized into four types—perceptibility, operability, understandability, and robustness—aligning with user capability modeling. In accessibility issue detection, evolution from static analysis to deep learning and large language models (LLMs) has shifted research from rule-based matching to semantic reasoning, enhancing automation and generalization. However, detection and repair remain loosely coupled and fragmented, with high false-positive rates and weak inter-module feedback loops. For repair, we propose a taxonomy of rule-driven, learning-based, and LLM-assisted approaches, revealing a gap between detection breadth and repair research depth. LLMs show potential for semantic understanding, issue reasoning, and repair generation, paving the way toward intelligent agents for accessibility. Looking ahead, future research should focus on semantic-enhanced detection, multimodal repair, refined user capability modeling, and unified evaluation standards—collectively advancing accessibility engineering toward an accessibility-by-default design paradigm.

Similar Papers
  • Conference Article
  • 10.1109/iconscept66142.2025.11436745
Context Aware Sentiment Analysis On Social Media Using Large Language Models: A Systematic Survey
  • Dec 6, 2025
  • Sowndarya V + 1 more

The social media messages generated by users normally comprise significant multimodal content. These messages are typically short and do not include explicit sentiment words. Nonetheless, the sentiment related to such messages can be comprehended by observing the context, which is important for enhancing the sentiment analysis performance. Regrettably, most of the previous studies examine the effect of contextual information under a single data model. Conventional sentiment analysis methods tend to lose contextual details like sarcasm, domain expressions, and multimodal language, resulting in lower accuracy in practical deployments. To alleviate such limitations, context-based solutions have come to the forefront using the potential of recent developments in deep learning and Large Language Models (LLMs). This systematic review explores the part played by LLMs in facilitating context-sensitive sentiment analysis on social media from 2019 to 2025. The state-of-the-art approaches and datasets were identified through a systematic search in leading academic databases, such as IEEE Xplore, ACM Digital Library, SpringerLink, ScienceDirect, Scopus, and Web of Science. This work offers a solid groundwork for practitioners and researchers who aim to further context-sensitive sentiment analysis in the age of big social media data.

  • Research Article
  • Cite Count Icon 4
  • 10.1109/jbhi.2025.3584179
Precision and Personalization: How Large Language Models Redefining Diagnostic Accuracy in Personalized Medicine - A Systematic Literature Review.
  • Jan 1, 2025
  • IEEE journal of biomedical and health informatics
  • A K N L Aththanagoda + 2 more

Personalized medicine aims to tailor medical treatments to the unique characteristics of each patient, but its effectiveness relies on achieving diagnostic accuracy to fully understand individual variability in disease response and treatment efficacy. This systematic literature review explores the role of large language models (LLMs) in enhancing diagnostic precision and supporting the advancement of personalized medicine. A comprehensive search was conducted across Web of Science, Science Direct, Scopus, and IEEE Xplore, targeting peer-reviewed articles published in English between January 2020 and March 2025 that applied LLMs within personalized medicine contexts. Following PRISMA guidelines, 39 relevant studies were selected and systematically analyzed. The findings indicate a growing integration of LLMs across key domains such as clinical informatics, medical imaging, patient-specific diagnosis, and clinical decision support. LLMs have shown potential in uncovering subtle data patterns critical for accurate diagnosis and personalized treatment planning. This review highlights the expanding role of LLMs in improving diagnostic accuracy in personalized medicine, offering insights into their performance, applications, and challenges, while also acknowledging limitations in generalizability due to variable model performance and dataset biases. The review highlights the importance of addressing challenges related to data privacy, model interpretability, and reliability across diverse clinical scenarios. For successful clinical integration, future research must focus on refining LLM technologies, ensuring ethical standards, and validating models continuously to safeguard effective and responsible use in healthcare environments.

  • Research Article
  • Cite Count Icon 1
  • 10.2214/ajr.25.33759
Artificial Intelligence for CT and MRI Protocoling: A Meta-Analysis of Traditional Machine Learning, BERT, and Large Language Models.
  • Oct 29, 2025
  • AJR. American journal of roentgenology
  • Ethan Sacoransky + 2 more

BACKGROUND. Examination protocoling is a resource-intensive task. Various artificial intelligence (AI) approaches have been investigated to automate this process. OBJECTIVE. The purpose of this study was to evaluate performance of traditional machine learning (ML) models, bidirectional encoder representations from transformers (BERT) models, and large language models (LLMs) for automated CT and MRI protocoling. EVIDENCE ACQUISITION. MEDLINE, Embase, Scopus, Web of Science, IEEE Xplore, and Google Scholar databases were searched through July 2025 for studies reporting the performance of an AI-based technique in assigning protocols for CT or MRI requisitions. Accuracy results were separately extracted for all models tested in each study and pooled using a random-effects meta-analysis. AI approaches were compared using Welch t tests. Common sources of error were qualitatively summarized. EVIDENCE SYNTHESIS. The final analysis included 23 studies, comprising 1,196,259 imaging requisitions. Requisition subspecialties included body imaging (n = 4), musculoskeletal imaging (n = 3), neuroradiology (n = 6), thoracic imaging (n = 1), and multiple subspecialties (n = 9). Sixteen studies evaluated traditional ML models, eight evaluated BERT models, and five evaluated LLMs. Task-specific model fine-tuning was performed in three studies for traditional ML models, all studies for BERT models, and one study for LLMs. The overall pooled protocoling accuracy was 85% (95% CI, 83-87%). The pooled accuracy was 83% (95% CI, 80-85%) for traditional ML models, 87% (95% CI, 85-89%) for BERT models, and 86% (95% CI, 83-89%) for LLMs; these pooled accuracies were not significantly different between any pairwise combination of the three AI approaches (all p > .05). Among 30 distinct models (14 traditional ML models, nine BERT models, seven LLMs), the top-10 performing models comprised two traditional ML models, six BERT models (including the top performing model [BioBERT, a biomedical-domain BERT; accuracy, 93%]), and two LLMs. Common sources of error included ambiguous requisition text, data imbalance yielding incorrect protocol assignments for low-volume protocols, the presence of multiple clinically reasonable protocols for given requisitions, and difficulty handling requisitions containing terms strongly associated with disparate protocols. CONCLUSION. The top-performing AI models for automated CT and MRI protocoling included predominantly fine-tuned BERT models. CLINICAL IMPACT. AI tools show strong potential to help streamline radiologist workflows, possibly through hybrid AI-radiologist approaches. Fine-tuned LLMs warrant further exploration. TRIAL REGISTRATION. PROSPERO identifier CRD420251088671.

  • Research Article
  • 10.52783/jisem.v9i4s.11887
Examine the Opportunities and Challenges of Large Language Model (LLM) For Indic Languages
  • Dec 30, 2024
  • Journal of Information Systems Engineering and Management
  • Brijeshkumar Y Panchal

Large Language Models like GPT and BERT have made significant advancements in NLP, particularly in text generation, translation, and summarization. However, their application in Indic languages remains relatively unexplored due to unique linguistic challenges such as complex morphology, diverse scripts, and limited digitized resources. This systematic literature review follows PRISMA guidelines to identify, analyze, and evaluate existing research on the opportunities and challenges of LLMs for Indic languages. The review covers relevant publications from databases like Web of Science, IEEE Xplore, and Springer. Inclusion criteria were applied to studies published between 2000 and 2024, with a focus on LLM architectures like GPT and BERT as applied to Indic languages. From 161 selected articles, this review highlights the potential of LLMs in improving machine translation, speech recognition, and cross-lingual NLP tasks for Indic languages. The review reveals that while LLMs show promise in enabling better language processing for Indic languages, challenges persist, particularly the scarcity of annotated datasets, script diversity, and the computational resources required for training. LLMs offer great potential to democratize access to digital content for Indic languages, helping bridge the linguistic digital divide. However, the challenges of limited data, linguistic diversity, and bias require collaborative research and tailored solutions. Comparison of Large Language Models for Indic language and in general also made a in this paper. This review identifies gaps in the current literature and suggests future directions to improve the efficacy of LLMs for underrepresented languages.

  • Research Article
  • 10.52783/jisem.v10i26s.4236
Examine the Opportunities and Challenges of Large Language Model (LLM) For Indic Languages
  • Mar 28, 2025
  • Journal of Information Systems Engineering and Management
  • Brijeshkumar Y Panchal

Large Language Models like GPT and BERT have made significant advancements in NLP, particularly in text generation, translation, and summarization. However, their application in Indic languages remains relatively unexplored due to unique linguistic challenges such as complex morphology, diverse scripts, and limited digitized resources. This systematic literature review follows PRISMA guidelines to identify, analyze, and evaluate existing research on the opportunities and challenges of LLMs for Indic languages. The review covers relevant publications from databases like Web of Science, IEEE Xplore, and Springer. Inclusion criteria were applied to studies published between 2000 and 2024, with a focus on LLM architectures like GPT and BERT as applied to Indic languages. From 161 selected articles, this review highlights the potential of LLMs in improving machine translation, speech recognition, and cross-lingual NLP tasks for Indic languages. The review reveals that while LLMs show promise in enabling better language processing for Indic languages, challenges persist, particularly the scarcity of annotated datasets, script diversity, and the computational resources required for training. LLMs offer great potential to democratize access to digital content for Indic languages, helping bridge the linguistic digital divide. However, the challenges of limited data, linguistic diversity, and bias require collaborative research and tailored solutions. Comparison of Large Language Models for Indic language and in general also made a in this paper. This review identifies gaps in the current literature and suggests future directions to improve the efficacy of LLMs for underrepresented languages.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 31
  • 10.3390/electronics13112055
A Systematic Literature Review on Using Natural Language Processing in Software Requirements Engineering
  • May 24, 2024
  • Electronics
  • Sabina-Cristiana Necula + 2 more

This systematic literature review examines the integration of natural language processing (NLP) in software requirements engineering (SRE) from 1991 to 2023. Focusing on the enhancement of software requirement processes through technological innovation, this study spans an extensive array of scholarly articles, conference papers, and key journal and conference reports, including data from Scopus, IEEE Xplore, ACM Digital Library, and Clarivate. Our methodology employs both quantitative bibliometric tools, like keyword trend analysis and thematic mapping, and qualitative content analysis to provide a robust synthesis of current trends and future directions. Reported findings underscore the essential roles of advanced computational techniques like machine learning, deep learning, and large language models in refining and automating SRE tasks. This review highlights the progressive adoption of these technologies in response to the increasing complexity of software systems, emphasizing their significant potential to enhance the accuracy and efficiency of requirement engineering practices while also pointing to the challenges of integrating artificial intelligence (AI) and NLP into existing SRE workflows. The systematic exploration of both historical contributions and emerging trends offers new insights into the dynamic interplay between technological advances and their practical applications in SRE.

  • Research Article
  • Cite Count Icon 1
  • 10.53858/arocpb05030112
Applications of Artificial Intelligence in Medicine: A Comprehensive Systematic Review and Meta-Analysis
  • Aug 5, 2025
  • AROC in Pharmaceutical and Biotechnology
  • Bolaji Mubarak Ayeyemi + 3 more

Background: The integration of Artificial Intelligence (AI) into medicine constitutes one of the most significant technological paradigm shifts in healthcare history. From the early rule-based expert systems of the 1970s to the current era of Deep Learning (DL) and Large Language Models (LLMs), AI has evolved to rival human performance in specific diagnostic tasks. This systematic review and meta-analysis aims to provide an exhaustive evaluation of AI applications across major medical specialties (Radiology, Pathology, Dermatology, Ophthalmology, Cardiology, Neurology, and Oncology), assess their diagnostic accuracy compared to clinical standards, and analyze the ethical, legal, and social implications of their widespread adoption. Methods: We conducted a PRISMA-compliant systematic search across PubMed, Scopus, Web of Science, IEEE Xplore, and arXiv for studies published between 2010 and 2025. We utilized the QUADAS-2 tool for quality assessment of diagnostic accuracy studies and ROBINS-I for observational studies. Results: A total of 4,250 records were screened, with 120 studies meeting the inclusion criteria for qualitative synthesis and 42 for quantitative meta-analysis. Deep learning models in radiology demonstrated a pooled sensitivity of 87% (95% CI: 85-89%) and specificity of 89% (95% CI: 87-91%). In dermatology, AI algorithms frequently outperformed general practitioners and performed on par with board-certified dermatologists in melanoma detection. Conclusion: AI demonstrates robust diagnostic performance, particularly in image-intensive fields. However, the translation from “code to clinic” is hindered by algorithmic bias, lack of explainability, and regulatory uncertainty. Future efforts must focus on prospective, randomized clinical trials and the development of equitable, robust AI frameworks.

  • Preprint Article
  • 10.2196/preprints.78410
“Is Attention All We Need?” - A Systematic Literature Review of LLMs in Mental Healthcare (Preprint)
  • Jun 2, 2025
  • Andreas Bucher + 4 more

BACKGROUND Mental healthcare systems worldwide face critical challenges, including limited access, shortages of clinicians, and stigma-related barriers. In parallel, Large Language Models (LLMs) have emerged as powerful tools capable of supporting therapeutic processes through natural language understanding and generation. While prior research has explored their potential, a comprehensive review assessing how LLMs are integrated into mental healthcare, particularly beyond technical feasibility, is still lacking. OBJECTIVE This systematic literature review investigates and conceptualizes the application of LLMs in mental healthcare by examining their technical implementation, design characteristics, and situational use across different touchpoints along the patient journey. It introduces a three-layer morphological framework to structure and analyze how LLMs are applied, with the goal of informing METHODS Following the methodology of vom Brocke et al. [1], a systematic literature review was conducted across PubMed, IEEE Xplore, JMIR, ACM, and AIS databases, yielding 807 studies. After multiple evaluation steps, 55 studies were included. These were categorized and analyzed based on the patient journey, design elements, and underlying model characteristics. RESULTS Most studies assessed technical feasibility, whereas only a few examined the impact of LLMs on therapeutic outcomes. LLMs were used primarily for classification and text generation tasks, with limited evaluation of safety, hallucination risks, or reasoning capabilities. Design aspects such as user roles, interaction modalities, and interface elements were often underexplored, despite their significant influence on user experience. Furthermore, most applications focused on single-user contexts, overlooking opportunities for integrated care environments, such as AI-blended therapy. The proposed three-layer framework, which consists of the L1: Situation-layer, the L2: Interface-layer, and the L3: LLM-layer, highlights critical design trade-offs and unmet needs in current research. CONCLUSIONS LLMs hold promise for enhancing accessibility, personalization, and efficiency in mental healthcare. However, current implementations often overlook essential design and contextual factors that influence real-world adoption and outcomes. The review underscores that the “self-attention” mechanism, a key component of LLMs, alone is not sufficient. Future research must go beyond technical feasibility to explore integrated care models, user experience, and longitudinal treatment outcomes to responsibly embed LLMs into mental healthcare ecosystems.

  • Research Article
  • Cite Count Icon 189
  • 10.1186/s12911-025-02954-4
A systematic review of large language model (LLM) evaluations in clinical medicine
  • Mar 7, 2025
  • BMC Medical Informatics and Decision Making
  • Sina Shool + 5 more

BackgroundLarge Language Models (LLMs), advanced AI tools based on transformer architectures, demonstrate significant potential in clinical medicine by enhancing decision support, diagnostics, and medical education. However, their integration into clinical workflows requires rigorous evaluation to ensure reliability, safety, and ethical alignment.ObjectiveThis systematic review examines the evaluation parameters and methodologies applied to LLMs in clinical medicine, highlighting their capabilities, limitations, and application trends.MethodsA comprehensive review of the literature was conducted across PubMed, Scopus, Web of Science, IEEE Xplore, and arXiv databases, encompassing both peer-reviewed and preprint studies. Studies were screened against predefined inclusion and exclusion criteria to identify original research evaluating LLM performance in medical contexts.ResultsThe results reveal a growing interest in leveraging LLM tools in clinical settings, with 761 studies meeting the inclusion criteria. While general-domain LLMs, particularly ChatGPT and GPT-4, dominated evaluations (93.55%), medical-domain LLMs accounted for only 6.45%. Accuracy emerged as the most commonly assessed parameter (21.78%). Despite these advancements, the evidence base highlights certain limitations and biases across the included studies, emphasizing the need for careful interpretation and robust evaluation frameworks.ConclusionsThe exponential growth in LLM research underscores their transformative potential in healthcare. However, addressing challenges such as ethical risks, evaluation variability, and underrepresentation of critical specialties will be essential. Future efforts should prioritize standardized frameworks to ensure safe, effective, and equitable LLM integration in clinical practice.

  • Research Article
  • 10.1371/journal.pone.0339594
Protocol for a scoping review examining the application of large language models in healthcare education and public health learning spaces
  • Jan 2, 2026
  • PLOS One
  • Henry Ndukwe + 1 more

ObjectiveThrough this scoping review, we aim to explore and synthesize existing knowledge and evidence on the learning approaches for incorporating LLMs into healthcare education and public health research and learning spaces. Specifically, we will attempt to investigate methods for auditing prompts for accuracy, fairness, and effectiveness; tailoring prompts to improve task-specific accuracy and utility; and exploring how end-user feedback is used to refine and optimize LLM prompts over time. This review will provide a comprehensive understanding of how LLMs are being tailored and improved in these fields, contributing to the development of evidence-based strategies for their implementation. It will also identify areas for future research and innovation.IntroductionThe increasing integration of large language models (LLMs) into healthcare education and public health research and learning spaces, highlights their potential to revolutionize service delivery, decision-making, and ultimately patient care and outcomes. Despite these advancements, understanding how LLMs can be effectively tailored, audited, and refined for learning remains a critical area of inquiry. Key issues include, the accuracy of generated information, and their relevance to the medical and public health fields.Inclusion criteriaOur focus will be on studies addressing LLM applications in healthcare education and public health research and learning spaces, prompt engineering techniques, prompt auditing methods, and processes geared towards integrating user feedback. Articles that do not focus on healthcare or public health contexts and lack relevance to LLM learning approaches will be excluded.MethodsThe review is guided by the JBI methodology for scoping reviews complemented by updates from Levac et al. Databases including PubMed, Scopus, IEEE Xplore, and Web of Science will be searched for peer-reviewed articles, conference proceedings, and grey literature published in English and French from 2015 to 2025. Data extraction will include information on study characteristics, LLM models, prompt engineering strategies, auditing methodologies, and user feedback mechanisms. We will synthesize to identify trends, gaps, and best practices in leveraging LLMs to generate baseline data for auditing prompts that optimize AI learning and education needs in the healthcare and public health sector.

  • Supplementary Content
  • Cite Count Icon 3
  • 10.2196/76326
Large Language Models in Critical Care Medicine: Scoping Review
  • Nov 24, 2025
  • JMIR Medical Informatics
  • Tongyue Shi + 9 more

BackgroundWith the rapid development of artificial intelligence, large language models (LLMs) have shown strong capabilities in natural language understanding, reasoning, and generation, attracting much research interest in applying LLMs to health and medicine. Critical care medicine (CCM) provides diagnosis and treatment for patients with critical illness who often require intensive monitoring and interventions in intensive care units (ICUs). Whether LLMs can be applied to CCM, and whether they can operate as ICU experts in assisting clinical decision-making rather than “stochastic parrots,” remains uncertain.ObjectiveThis scoping review aims to provide a panoramic portrait of the application of LLMs in CCM, identifying the advantages, challenges, and future potential of LLMs in this field.MethodsThis study was conducted in accordance with the PRISMA-ScR (Preferred Reporting Items for Systematic Reviews and Meta-Analyses extension for Scoping Reviews) guidelines. Literature was searched across 7 databases, including PubMed, Embase, Scopus, Web of Science, CINAHL, IEEE Xplore, and ACM Digital Library, from the first available paper to August 22, 2025.ResultsFrom an initial 2342 retrieved papers, 41 were selected for final review. LLMs played an important role in CCM through the following 3 main channels: clinical decision support, medical documentation and reporting, and medical education and doctor-patient communication. Compared to traditional artificial intelligence models, LLMs have advantages in handling unstructured data and do not require manual feature engineering. Meanwhile, applying LLMs to CCM has faced challenges, including hallucinations and poor interpretability, sensitivity to prompts, bias and alignment challenges, and privacy and ethical issues.ConclusionsAlthough LLMs are not yet ICU experts, they have the potential to become valuable tools in CCM, helping to improve patient outcomes and optimize health care delivery. Future research should enhance model reliability and interpretability, improve model training and deployment scalability, integrate up-to-date medical knowledge, and strengthen privacy and ethical guidelines, paving the way for LLMs to fully realize their impact in critical care.Trial RegistrationOSF Registries yn328; https://osf.io/yn328/

  • Research Article
  • Cite Count Icon 9
  • 10.2196/78410
“It’s Not Only Attention We Need”: Systematic Review of Large Language Models in Mental Health Care
  • Nov 4, 2025
  • JMIR Mental Health
  • Andreas Bucher + 4 more

BackgroundMental health care systems worldwide face critical challenges, including limited access, shortages of clinicians, and stigma-related barriers. In parallel, large language models (LLMs) have emerged as powerful tools capable of supporting therapeutic processes through natural language understanding and generation. While previous research has explored their potential, a comprehensive review assessing how LLMs are integrated into mental health care, particularly beyond technical feasibility, is still lacking.ObjectiveThis systematic literature review investigates and conceptualizes the application of LLMs in mental health care by examining their technical implementation, design characteristics, and situational use across different touchpoints along the patient journey. It introduces a 3-layer morphological framework to structure and analyze how LLMs are applied, with the goal of informing future research and design for more effective mental health interventions.MethodsA systematic literature review was conducted across PubMed, IEEE Xplore, JMIR, ACM, and AIS databases, yielding 807 studies. After multiple evaluation steps, 55 studies were included. These were categorized and analyzed based on the patient journey, design elements, and underlying model characteristics.ResultsMost studies assessed technical feasibility, whereas only a few examined the impact of LLMs on therapeutic outcomes. LLMs were used primarily for classification and text generation tasks, with limited evaluation of safety, hallucination risks, or reasoning capabilities. Design aspects, such as user roles, interaction modalities, and interface elements, were often underexplored, despite their significant influence on user experience. Furthermore, most applications focused on single-user contexts, overlooking opportunities for integrated care environments, such as artificial intelligence–blended therapy. The proposed 3-layer framework, which consists of the L1: LLM layer, L2: interface layer, and L3: situation layer, highlights critical design trade-offs and unmet needs in current research.ConclusionsLLMs hold promise for enhancing accessibility, personalization, and efficiency in mental health care. However, current implementations often overlook essential design and contextual factors that influence real-world adoption and outcomes. The review underscores that the self-attention mechanism, a key component of LLMs, alone is not sufficient. Future research must go beyond technical feasibility to explore integrated care models, user experience, and longitudinal treatment outcomes to responsibly embed LLMs into mental health care ecosystems.

  • Research Article
  • Cite Count Icon 6
  • 10.34133/icomputing.0110
Deep Learning and Methods Based on Large Language Models Applied to Stellar Light Curve Classification
  • Jan 1, 2025
  • Intelligent Computing
  • Yu-Yang Li + 6 more

Light curves serve as a valuable source of information on stellar formation and evolution. With the rapid advancement of machine learning techniques, they can be effectively processed to extract astronomical patterns and information. In this study, we present a comprehensive evaluation of models based on deep learning and large language models (LLMs) for the automatic classification of variable star light curves, using large datasets from the Kepler and K2 missions. Special emphasis is placed on Cepheids, RR Lyrae, and eclipsing binaries, examining the influence of observational cadence and phase distribution on classification precision. Employing automated deep learning optimization, we achieve striking performance using 2 architectures: one that combines one-dimensional convolution (Conv1D) with bidirectional long short-term memory (BiLSTM) and another called the Swin Transformer. These achieved accuracies of 94% and 99%, respectively, with the latter demonstrating a notable 83% accuracy in discerning the elusive type II Cepheids that comprise merely 0.02% of the total dataset. We unveil StarWhisper LightCurve (LC), a series of 3 LLM models based on an LLM, a multimodal large language model (MLLM), and a large audio language model (LALM). Each model is fine-tuned with strategic prompt engineering and customized training methods to explore the emergent abilities of these models for astronomical data. Remarkably, StarWhisper LC series models exhibit high accuracies of around 90%, considerably reducing the need for explicit feature engineering, thereby paving the way for streamlined parallel data processing and the progression of multifaceted multimodal models in astronomical applications. The study furnishes 2 detailed catalogs illustrating the impacts of phase and sampling intervals on deep learning classification accuracy, showing that a substantial decrease of up to 14% in observation duration and 21% in sampling points can be realized without compromising accuracy by more than 10%.

  • Preprint Article
  • 10.2196/preprints.72062
Large Language Models in Medical Diagnostics: Scoping Review With Bibliometric Analysis (Preprint)
  • Feb 2, 2025
  • Hankun Su + 14 more

BACKGROUND The integration of large language models (LLMs) into medical diagnostics has garnered substantial attention due to their potential to enhance diagnostic accuracy, streamline clinical workflows, and address health care disparities. However, the rapid evolution of LLM research necessitates a comprehensive synthesis of their applications, challenges, and future directions. OBJECTIVE This scoping review aimed to provide an overview of the current state of research regarding the use of LLMs in medical diagnostics. The study sought to answer four primary subquestions, as follows: (1) Which LLMs are commonly used? (2) How are LLMs assessed in diagnosis? (3) What is the current performance of LLMs in diagnosing diseases? (4) Which medical domains are investigating the application of LLMs? METHODS This scoping review was conducted according to the Joanna Briggs Institute Manual for Evidence Synthesis and adheres to the PRISMA-ScR (Preferred Reporting Items for Systematic Reviews and Meta-Analyses extension for Scoping Reviews). Relevant literature was searched from the Web of Science, PubMed, Embase, IEEE Xplore, and ACM Digital Library databases from 2022 to 2025. Articles were screened and selected based on predefined inclusion and exclusion criteria. Bibliometric analysis was performed using VOSviewer to identify major research clusters and trends. Data extraction included details on LLM types, application domains, and performance metrics. RESULTS The field is rapidly expanding, with a surge in publications after 2023. GPT-4 and its variants dominated research (70/95, 74% of studies), followed by GPT-3.5 (34/95, 36%). Key applications included disease classification (text or image-based), medical question answering, and diagnostic content generation. LLMs demonstrated high accuracy in specialties like radiology, psychiatry, and neurology but exhibited biases in race, gender, and cost predictions. Ethical concerns, including privacy risks and model hallucination, alongside regulatory fragmentation, were critical barriers to clinical adoption. CONCLUSIONS LLMs hold transformative potential for medical diagnostics but require rigorous validation, bias mitigation, and multimodal integration to address real-world complexities. Future research should prioritize explainable artificial intelligence frameworks, specialty-specific optimization, and international regulatory harmonization to ensure equitable and safe clinical deployment.

  • Research Article
  • Cite Count Icon 22
  • 10.2196/72062
Large Language Models in Medical Diagnostics: Scoping Review With Bibliometric Analysis.
  • Jun 9, 2025
  • Journal of medical Internet research
  • Hankun Su + 14 more

The integration of large language models (LLMs) into medical diagnostics has garnered substantial attention due to their potential to enhance diagnostic accuracy, streamline clinical workflows, and address health care disparities. However, the rapid evolution of LLM research necessitates a comprehensive synthesis of their applications, challenges, and future directions. This scoping review aimed to provide an overview of the current state of research regarding the use of LLMs in medical diagnostics. The study sought to answer four primary subquestions, as follows: (1) Which LLMs are commonly used? (2) How are LLMs assessed in diagnosis? (3) What is the current performance of LLMs in diagnosing diseases? (4) Which medical domains are investigating the application of LLMs? This scoping review was conducted according to the Joanna Briggs Institute Manual for Evidence Synthesis and adheres to the PRISMA-ScR (Preferred Reporting Items for Systematic Reviews and Meta-Analyses extension for Scoping Reviews). Relevant literature was searched from the Web of Science, PubMed, Embase, IEEE Xplore, and ACM Digital Library databases from 2022 to 2025. Articles were screened and selected based on predefined inclusion and exclusion criteria. Bibliometric analysis was performed using VOSviewer to identify major research clusters and trends. Data extraction included details on LLM types, application domains, and performance metrics. The field is rapidly expanding, with a surge in publications after 2023. GPT-4 and its variants dominated research (70/95, 74% of studies), followed by GPT-3.5 (34/95, 36%). Key applications included disease classification (text or image-based), medical question answering, and diagnostic content generation. LLMs demonstrated high accuracy in specialties like radiology, psychiatry, and neurology but exhibited biases in race, gender, and cost predictions. Ethical concerns, including privacy risks and model hallucination, alongside regulatory fragmentation, were critical barriers to clinical adoption. LLMs hold transformative potential for medical diagnostics but require rigorous validation, bias mitigation, and multimodal integration to address real-world complexities. Future research should prioritize explainable artificial intelligence frameworks, specialty-specific optimization, and international regulatory harmonization to ensure equitable and safe clinical deployment.

Save Icon
Up Arrow
Open/Close
Notes

Save Important notes in documents

Highlight text to save as a note, or write notes directly

You can also access these Documents in Paperpal, our AI writing tool

Powered by our AI Writing Assistant