Evaluating large language models for software testing

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon
Take notes icon Take Notes

Evaluating large language models for software testing

Similar Papers
  • Research Article
  • Cite Count Icon 8
  • 10.1287/ijds.2023.0007
How Can IJDS Authors, Reviewers, and Editors Use (and Misuse) Generative AI?
  • Apr 1, 2023
  • INFORMS Journal on Data Science
  • Galit Shmueli + 7 more

How Can <i>IJDS</i> Authors, Reviewers, and Editors Use (and Misuse) Generative AI?

  • Research Article
  • 10.1093/ndt/gfae069.792
#2924 Comparison of large language models and traditional natural language processing techniques in predicting arteriovenous fistula failure
  • May 23, 2024
  • Nephrology Dialysis Transplantation
  • Suman Lama + 6 more

Background and Aims Large language models (LLMs) have gained significant attention in the field of natural language processing (NLP), marking a shift from traditional techniques like Term Frequency-Inverse Document Frequency (TF-IDF). We developed a traditional NLP model to predict arteriovenous fistula (AVF) failure within next 30 days using clinical notes. The goal of this analysis was to investigate whether LLMs would outperform traditional NLP techniques, specifically in the context of predicting AVF failure within the next 30 days using clinical notes. Method We defined AVF failure as the change in status from active to permanently unusable status or temporarily unusable status. We used data from a large kidney care network from January 2021 to December 2021. Two models were created using LLMs and traditional TF-IDF technique. We used “distilbert-base-uncased”, a distilled version of BERT base model [1], and compared its performance with traditional TF-IDF-based NLP techniques. The dataset was randomly divided into 60% training, 20% validation and 20% test dataset. The test data, comprising of unseen patients’ data was used to evaluate the performance of the model. Both models were evaluated using metrics such as area under the receiver operating curve (AUROC), accuracy, sensitivity, and specificity. Results The incidence of 30 days AVF failure rate was 2.3% in the population. Both LLMs and traditional showed similar overall performance as summarized in Table 1. Notably, LLMs showed marginally better performance in certain evaluation metrics. Both models had same AUROC of 0.64 on test data. The accuracy and balanced accuracy for LLMs were 72.9% and 59.7%, respectively, compared to 70.9% and 59.6% for the traditional TF-IDF approach. In terms of specificity, LLMs scored 73.2%, slightly higher than the 71.2% observed for traditional NLP methods. However, LLMs had a lower sensitivity of 46.1% compared to 48% for traditional NLP. However, it is worth noting that training on LLMs took considerably longer than TF-IDF. Moreover, it also used higher computational resources such as utilization of graphics processing units (GPU) instances in cloud-based services, leading to higher cost. Conclusion In our study, we discovered that advanced LLMs perform comparably to traditional TF-IDF modeling techniques in predicting the failure of AVF. Both models demonstrated identical AUROC. While specificity was higher in LLMs compared to traditional NLP, sensitivity was higher in traditional NLP compared to LLMs. LLM was fine-tuned with a limited dataset, which could have influenced its performance to be similar to that of traditional NLP methods. This finding suggests that while LLMs may excel in certain scenarios, such as performing in-depth sentiment analysis of patient data for complex tasks, their effectiveness is highly dependent on the specific use case. It is crucial to weigh the benefits against the resources required for LLMs, as they can be significantly more resource-intensive and costly compared to traditional TF-IDF methods. This highlights the importance of a use-case-driven approach in selecting the appropriate NLP technique for healthcare applications.

  • Research Article
  • 10.55041/ijsrem36608
Exploring Vulnerabilities and Threats in Large Language Models: Safeguarding Against Exploitation and Misuse
  • Aug 10, 2024
  • INTERANTIONAL JOURNAL OF SCIENTIFIC RESEARCH IN ENGINEERING AND MANAGEMENT
  • Mr Aarush Varma + 1 more

This research paper delves into the inherent vulnerabilities and potential threats posed by large language models (LLMs), focusing on their implications across diverse applications such as natural language processing and data privacy. The study aims to identify and analyze these risks comprehensively, emphasizing the importance of mitigating strategies to prevent exploitation and misuse in LLM deployments. In recent years, LLMs have revolutionized fields like automated content generation, sentiment analysis, and conversational agents, yet their immense capabilities also raise significant security concerns. Vulnerabilities such as bias amplification, adversarial attacks, and unintended data leakage can undermine trust and compromise user privacy. Through a systematic examination of these challenges, this paper proposes safeguarding measures crucial for responsibly harnessing the potential of LLMs while minimizing associated risks. It underscores the necessity of rigorous security protocols, including robust encryption methods, enhanced authentication mechanisms, and continuous monitoring frameworks. Furthermore, the research discusses regulatory implications and ethical considerations surrounding LLM usage, advocating for transparency, accountability, and stakeholder engagement in policy- making and deployment practices. By synthesizing insights from current literature and real-world case studies, this study provides a comprehensive framework for stakeholders—developers, policymakers, and users—to navigate the complex landscape of LLM security effectively. Ultimately, this research aims to inform future advancements in LLM technology, ensuring its safe and beneficial integration into various domains while mitigating potential risks to individuals and society as a whole. Keywords— Adversarial attacks on LLMs, Bias in LLMs, Data privacy in LLMs, Ethical considerations LLMs, Exploitation of LLMs, Large Language Models (LLMs), Misuse of LLMs, Mitigation strategies for LLMs, Natural Language Processing (NLP), Regulatory frameworks LLMs, Responsible deployment of LLMs, Risks of LLMs, Security implications of LLMs, Threats to LLMs, Vulnerabilities in LLMs.

  • Research Article
  • Cite Count Icon 8
  • 10.1016/j.procs.2023.09.086
A Large and Diverse Arabic Corpus for Language Modeling
  • Jan 1, 2023
  • Procedia Computer Science
  • Abbas Raza Ali + 3 more

A Large and Diverse Arabic Corpus for Language Modeling

  • Research Article
  • Cite Count Icon 30
  • 10.1093/jamia/ocae074
Large language models for biomedicine: foundations, opportunities, challenges, and best practices.
  • Apr 24, 2024
  • Journal of the American Medical Informatics Association : JAMIA
  • Satya S Sahoo + 8 more

Generative large language models (LLMs) are a subset of transformers-based neural network architecture models. LLMs have successfully leveraged a combination of an increased number of parameters, improvements in computational efficiency, and large pre-training datasets to perform a wide spectrum of natural language processing (NLP) tasks. Using a few examples (few-shot) or no examples (zero-shot) for prompt-tuning has enabled LLMs to achieve state-of-the-art performance in a broad range of NLP applications. This article by the American Medical Informatics Association (AMIA) NLP Working Group characterizes the opportunities, challenges, and best practices for our community to leverage and advance the integration of LLMs in downstream NLP applications effectively. This can be accomplished through a variety of approaches, including augmented prompting, instruction prompt tuning, and reinforcement learning from human feedback (RLHF). Our focus is on making LLMs accessible to the broader biomedical informatics community, including clinicians and researchers who may be unfamiliar with NLP. Additionally, NLP practitioners may gain insight from the described best practices. We focus on 3 broad categories of NLP tasks, namely natural language understanding, natural language inferencing, and natural language generation. We review the emerging trends in prompt tuning, instruction fine-tuning, and evaluation metrics used for LLMs while drawing attention to several issues that impact biomedical NLP applications, including falsehoods in generated text (confabulation/hallucinations), toxicity, and dataset contamination leading to overfitting. We also review potential approaches to address some of these current challenges in LLMs, such as chain of thought prompting, and the phenomena of emergent capabilities observed in LLMs that can be leveraged to address complex NLP challenge in biomedical applications.

  • Research Article
  • 10.1007/s10143-025-03785-7
Current trends and future prospects of language models and processing systems in spine surgery - a scoping review.
  • Sep 5, 2025
  • Neurosurgical review
  • Vivek Sanker + 9 more

Natural language processing (NLPs) and Large language models (LLM), such as ChatGPT, represent transformative advancements in artificial intelligence (AI). Their implementation into the medical field has a broad potential, and this review discusses the current trends and prospects of NLPs and LLMs in spine surgery, assessing their potential benefits, applications, and limitations. The methodology involved a comprehensive narrative review of existing English literature related to the use of NLPs and LLMs in spine surgery. We searched the databases PubMed, EMBASE, Web of Science and Scopus from inception until 16th June 2025 using keywords evolving around LLM, natural language processing and spine surgery. Original studies, clinical reports, and case series were included, while abstracts or unpublished studies were excluded. From 221 initial records, 37 studies were included: 18 evaluated LLMs and 19 evaluated NLP-based tools. LLMs were commonly used for clinical decision-making (n = 8), patient counseling (n = 7), classification (n = 2), and in research (n = 1). NLPs were applied in classification tasks (n = 12), clinical decision-making (n = 3), patient counseling (n = 1), postoperative opioid monitoring (n = 2), and research registry development (n = 1). ChatGPT-4 achieved up to 92% accuracy in clinical recommendations, outperforming GPT-3.5 in multiple tasks. Comparative analyses have found that newer versions of LLMs, such as ChatGPT-4, outperform previous versions, evident by greater accuracy and to a lesser extent of artificial hallucination. However, limitations persist, including overconfident outputs, adherence gaps to clinical guidelines, and inconsistent patient readability. While this review suggests that NLPs and LLMs can have a significant impact on spine practice, it is important to keep their limitations in mind and implement them with caution. To maximize the benefits of these models in spine surgery, future research should focus on improving model sensitivity and specificity, promoting multi-disciplinary collaborations, and addressing ethical considerations regarding the use of language models in medical practice, including the inherent issue of hallucination of these models.

  • Research Article
  • 10.63345/jqst.v2i1.163
Deploying Large Language Models (LLMs) for Automated Test Case Generation and QA Evaluation
  • Jan 1, 2025
  • Journal of Quantum Science and Technology
  • Vybhav Reddy Kammireddy Changalreddy + 1 more

The deployment of Large Language Models (LLMs) for automated test case generation and quality assurance (QA) evaluation represents a significant advancement in software testing. With the increasing complexity of modern applications, traditional methods of test case creation and manual evaluation have proven inefficient and error-prone. LLMs, with their ability to understand natural language inputs and generate contextually relevant outputs, offer a promising solution to this challenge. This paper explores the application of LLMs to automate the generation of test cases, ensuring broader coverage and improved accuracy in detecting potential software defects. By leveraging the vast training data of LLMs, these models can interpret requirements, user stories, or functional specifications and automatically generate a diverse set of test cases that address various use cases and edge cases. Additionally, LLMs can be employed for real-time QA evaluation, analyzing the results of test executions and identifying discrepancies, inconsistencies, or anomalies that may otherwise be overlooked. This paper also highlights the integration of LLMs with existing testing frameworks and CI/CD pipelines, showcasing how they can augment human efforts, reduce time-to-market, and improve the overall reliability of software products. Through case studies and experiments, we demonstrate the effectiveness of LLMs in enhancing test case generation and QA evaluation, paving the way for more efficient, scalable, and robust software testing practices in the era of artificial intelligence.

  • Preprint Article
  • 10.20944/preprints202504.1933.v1
Evaluating Logical Reasoning Ability of Large Language Models
  • Apr 23, 2025
  • Emunah Chan

Large language models (LLMs) such as ChatGPT and DeepSeek have recently made significant progress in natural language processing, demonstrating reasoning ability close to human intelligence. This has sparked considerable research interest since reasoning is a hallmark of human intelligence that is widely considered missed in artificial intelligence systems. Due to the large size of these models, evaluation of LLMs&amp;rsquo; reasoning ability is largely empirical. Creating datasets to evaluate the reasoning ability of LLMs is an active research area. A key open question is whether LLMs reason or simply recite memorized texts they have encountered during their training phase. This work conducts simple experiments using Cheryl&amp;rsquo;s Birthday Puzzle and Cheryl&amp;rsquo;s Age Puzzle to investigate whether LLMs recite or reason and discovers that LLMs tend to recite memorized answers for well-known questions, which appear frequently on the internet. As a result, to accurately evaluate the reasoning ability of LLMs, it is essential to create new datasets to ensure that LLMs truly use their reasoning ability to generate responses to the presented questions. In view of the finding, this work proposes a new dataset comprising of questions requiring semantic and deductive logical reasoning skills to elicit reasoning ability from LLMs. The proposed evaluation framework has several desirable properties, including resilience to training data contamination, ease of response verification, extensibility, usefulness and automated test case generation. This work applies the proposed dataset to evaluate the reasoning ability of state-of-the-art LLMs, including GPT-3, GPT-4, Llama-3.1, Germini-1.5, Claude-3.5 and DeepSeek-V3. A significant observation is that most LLMs achieve a performance independent of question complexity. This suggests that they reason more like an algorithm than human intelligence. In contrast, DeepSeek-V3 resembles human reasoning behaviour most among all the tested LLMs. Finally, an algorithm to automatically generate the dataset of logical reasoning questions is presented.

  • Research Article
  • 10.54254/2755-2721/2025.22701
Systematically Understanding of Code Semantic Interpretation for LLMs
  • May 15, 2025
  • Applied and Computational Engineering
  • Jia Zhao

Understanding and interpreting code is a crucial task in intelligent software engineering, aiding developers and users in adjusting code for correctness and robustness. The emergence of large language models (LLMs) provides new perspectives for code interpretation tasks. However, current LLM-based code interpretation remains restricted to limited dimensions, lacks a unified evaluation standard, and is missing a comprehensive and systematic assessment methodology. To address this issue, this paper proposes an LLM code understanding evaluation method based on a multi-granularity voting mechanism, aiming to systematically investigate and analyze LLMs' performance in code interpretation tasks. First, we carefully select code snippets from open-source GitHub projects and preprocess them for LLM analysis. Second, we use identical prompts and inputs to test three popular LLMs, recording their output. During this process, we apply prompt engineering techniques to specific target code snippets and conduct repeated experiments to explore the impact of prompt engineering on LLM-generated code explanations. Next, we design evaluation metrics to quantify the LLM outputs and assess their effectiveness based on the obtained scores. Experimental results demonstrate significant differences in code analysis and generation capabilities among the evaluated general-purpose LLMs from different vendors when given identical prompts and inputs. When multiple dimensions are considered in evaluating the generated content, different LLMs exhibit varying strengths in different aspects. Additionally, applying specific prompt engineering techniques can moderate the discrepancies in code analysis and generation capabilities among different LLMs.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 13
  • 10.1108/jebde-08-2023-0015
Unraveling the landscape of large language models: a systematic review and future perspectives
  • Dec 19, 2023
  • Journal of Electronic Business &amp; Digital Economics
  • Qinxu Ding + 4 more

PurposeThe rapid rise of large language models (LLMs) has propelled them to the forefront of applications in natural language processing (NLP). This paper aims to present a comprehensive examination of the research landscape in LLMs, providing an overview of the prevailing themes and topics within this dynamic domain.Design/methodology/approachDrawing from an extensive corpus of 198 records published between 1996 to 2023 from the relevant academic database encompassing journal articles, books, book chapters, conference papers and selected working papers, this study delves deep into the multifaceted world of LLM research. In this study, the authors employed the BERTopic algorithm, a recent advancement in topic modeling, to conduct a comprehensive analysis of the data after it had been meticulously cleaned and preprocessed. BERTopic leverages the power of transformer-based language models like bidirectional encoder representations from transformers (BERT) to generate more meaningful and coherent topics. This approach facilitates the identification of hidden patterns within the data, enabling authors to uncover valuable insights that might otherwise have remained obscure. The analysis revealed four distinct clusters of topics in LLM research: “language and NLP”, “education and teaching”, “clinical and medical applications” and “speech and recognition techniques”. Each cluster embodies a unique aspect of LLM application and showcases the breadth of possibilities that LLM technology has to offer. In addition to presenting the research findings, this paper identifies key challenges and opportunities in the realm of LLMs. It underscores the necessity for further investigation in specific areas, including the paramount importance of addressing potential biases, transparency and explainability, data privacy and security, and responsible deployment of LLM technology.FindingsThe analysis revealed four distinct clusters of topics in LLM research: “language and NLP”, “education and teaching”, “clinical and medical applications” and “speech and recognition techniques”. Each cluster embodies a unique aspect of LLM application and showcases the breadth of possibilities that LLM technology has to offer. In addition to presenting the research findings, this paper identifies key challenges and opportunities in the realm of LLMs. It underscores the necessity for further investigation in specific areas, including the paramount importance of addressing potential biases, transparency and explainability, data privacy and security, and responsible deployment of LLM technology.Practical implicationsThis classification offers practical guidance for researchers, developers, educators, and policymakers to focus efforts and resources. The study underscores the importance of addressing challenges in LLMs, including potential biases, transparency, data privacy, and responsible deployment. Policymakers can utilize this information to shape regulations, while developers can tailor technology development based on the diverse applications identified. The findings also emphasize the need for interdisciplinary collaboration and highlight ethical considerations, providing a roadmap for navigating the complex landscape of LLM research and applications.Originality/valueThis study stands out as the first to examine the evolution of LLMs across such a long time frame and across such diversified disciplines. It provides a unique perspective on the key areas of LLM research, highlighting the breadth and depth of LLM’s evolution.

  • Research Article
  • Cite Count Icon 1
  • 10.1145/3749840
AdaptiveLog: An Adaptive Log Analysis Framework with the Collaboration of Large and Small Language Model
  • Jul 22, 2025
  • ACM Transactions on Software Engineering and Methodology
  • Lipeng Ma + 8 more

Automated log analysis is crucial to ensure the high availability and reliability of complex systems. The advent of large language models (LLMs) in natural language processing (NLP) has ushered in a new era of language model-driven automated log analysis, garnering significant interest. Within this field, two primary paradigms based on language models for log analysis have become prominent. Small Language Models (SLMs) (such as BERT) follow the pre-train and fine-tune paradigm, focusing on the specific log analysis task through fine-tuning on supervised datasets. On the other hand, LLMs (such as ChatGPT) following the in-context learning paradigm, analyze logs by providing a few examples in prompt contexts without updating parameters. Despite their respective strengths, both models exhibit inherent limitations. By comparing SLMs and LLMs, we notice that SLMs are more cost-effective but less powerful, whereas LLMs with large parameters are highly powerful but expensive and inefficient. To trade-off between the performance and inference costs of both models in automated log analysis, this paper introduces an adaptive log analysis framework known as AdaptiveLog, which effectively reduces the costs associated with LLM while ensuring superior results. This framework collaborates an LLM and a small language model, strategically allocating the LLM to tackle complex logs while delegating simpler logs to the SLM. Specifically, to efficiently query the LLM, we propose an adaptive selection strategy based on the uncertainty estimation of the SLM, where the LLM is invoked only when the SLM is uncertain. In addition, to enhance the reasoning ability of the LLM in log analysis tasks, we propose a novel prompt strategy by retrieving similar error-prone cases as the reference, enabling the model to leverage past error experiences and learn solutions from these cases. We evaluate AdaptiveLog on different log analysis tasks, extensive experiments demonstrate that AdaptiveLog achieves state-of-the-art results across different tasks, elevating the overall accuracy of log analysis while maintaining cost efficiency. Our source code and detailed experimental data are available at https://github.com/LeaperOvO/AdaptiveLog-review .

  • Research Article
  • Cite Count Icon 3
  • 10.1016/j.artmed.2024.103009
Developing healthcare language model embedding spaces
  • Oct 31, 2024
  • Artificial Intelligence In Medicine
  • Niall Taylor + 4 more

Pre-trained Large Language Models (LLMs) have revolutionised Natural Language Processing (NLP) tasks, but often struggle when applied to specialised domains such as healthcare. The traditional approach of pre-training on large datasets followed by task-specific fine-tuning is resource-intensive and poorly aligned with the constraints of many healthcare settings. This presents a significant challenge for deploying LLM-based NLP solutions in medical contexts, where data privacy, computational resources, and domain-specific language pose unique obstacles.This study aims to develop and evaluate efficient methods for adapting smaller LLMs to healthcare-specific datasets and tasks. We seek to identify pre-training approaches that can effectively instil healthcare competency in compact LLMs under tight computational budgets, a crucial capability for responsible and sustainable deployment in local healthcare settings.We explore three specialised pre-training methods to adapt smaller LLMs to different healthcare datasets: traditional Masked Language modelling (MLM), Deep Contrastive Learning for Unsupervised Textual Representations (DeCLUTR), and a novel approach utilising metadata categories from healthcare settings. These methods are assessed across multiple healthcare datasets, with a focus on downstream document classification tasks. We evaluate the performance of the resulting LLMs through classification accuracy and analysis of the derived embedding spaces.Contrastively trained models consistently outperform other approaches on classification tasks, delivering strong performance with limited labelled data and fewer model parameter updates. While our novel metadata-based pre-training does not further improve classifications across datasets, it yields interesting embedding cluster separability. Importantly, all domain-adapted LLMs outperform their publicly available, general-purpose base models, validating the importance of domain specialisation.This research demonstrates the efficacy of specialised pre-training methods in adapting compact LLMs to healthcare tasks, even under resource constraints. We provide guidelines for pre-training specialised healthcare LLMs and motivate continued inquiry into contrastive objectives. Our findings underscore the potential of these approaches for aligning small LLMs with privacy-sensitive medical tasks, offering a path toward more efficient and responsible NLP deployment in healthcare settings. This work contributes to the broader goal of making advanced NLP techniques accessible and effective in specialised domains, particularly where resource limitations and data sensitivity are significant concerns.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 77
  • 10.3390/app14052074
A Review of Current Trends, Techniques, and Challenges in Large Language Models (LLMs)
  • Mar 1, 2024
  • Applied Sciences
  • Rajvardhan Patil + 1 more

Natural language processing (NLP) has significantly transformed in the last decade, especially in the field of language modeling. Large language models (LLMs) have achieved SOTA performances on natural language understanding (NLU) and natural language generation (NLG) tasks by learning language representation in self-supervised ways. This paper provides a comprehensive survey to capture the progression of advances in language models. In this paper, we examine the different aspects of language models, which started with a few million parameters but have reached the size of a trillion in a very short time. We also look at how these LLMs transitioned from task-specific to task-independent to task-and-language-independent architectures. This paper extensively discusses different pretraining objectives, benchmarks, and transfer learning methods used in LLMs. It also examines different finetuning and in-context learning techniques used in downstream tasks. Moreover, it explores how LLMs can perform well across many domains and datasets if sufficiently trained on a large and diverse dataset. Next, it discusses how, over time, the availability of cheap computational power and large datasets have improved LLM’s capabilities and raised new challenges. As part of our study, we also inspect LLMs from the perspective of scalability to see how their performance is affected by the model’s depth, width, and data size. Lastly, we provide an empirical comparison of existing trends and techniques and a comprehensive analysis of where the field of LLM currently stands.

  • Research Article
  • 10.1609/aaai.v39i1.32018
Simulate and Eliminate: Revoke Backdoors for Generative Large Language Models
  • Apr 11, 2025
  • Proceedings of the AAAI Conference on Artificial Intelligence
  • Haoran Li + 6 more

With rapid advances, generative large language models (LLMs) dominate various Natural Language Processing (NLP) tasks from understanding to reasoning. Yet, language models' inherent vulnerabilities may be exacerbated due to increased accessibility and unrestricted model training on massive data. A malicious adversary may publish poisoned data online and conduct backdoor attacks on the victim LLMs pre-trained on the poisoned data. Backdoored LLMs behave innocuously for normal queries and generate harmful responses when the backdoor trigger is activated. Despite significant efforts paid to LLMs' safety issues, LLMs are still struggling against backdoor attacks. As Anthropic recently revealed, existing safety training strategies, including supervised fine-tuning (SFT) and Reinforcement Learning from Human Feedback (RLHF), fail to revoke the backdoors once the LLM is backdoored during the pre-training stage. In this paper, we present Simulate and Eliminate (SANDE) to erase the undesired backdoored mappings for generative LLMs. We initially propose Overwrite Supervised Fine-tuning (OSFT) for effective backdoor removal when the trigger is known. Then, to handle scenarios where trigger patterns are unknown, we integrate OSFT into our two-stage framework, SANDE. Unlike other works that assume access to cleanly trained models, our safety-enhanced LLMs are able to revoke backdoors without any reference. Consequently, our safety-enhanced LLMs no longer produce targeted responses when the backdoor triggers are activated. We conduct comprehensive experiments to show that our proposed SANDE is effective against backdoor attacks while bringing minimal harm to LLMs' powerful capability.

  • Research Article
  • Cite Count Icon 6
  • 10.1016/j.jpainsymman.2024.11.016
Large language models to identify advance care planning in patients with advanced cancer
  • Mar 1, 2025
  • Journal of Pain and Symptom Management
  • Nicole D Agaronnik + 4 more

Large language models to identify advance care planning in patients with advanced cancer

Save Icon
Up Arrow
Open/Close
  • Ask R Discovery Star icon
  • Chat PDF Star icon

AI summaries and top papers from 250M+ research sources.