On biological and artificial consciousness: A case for biological computationalism.

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon

The rapid advances in the capabilities of Large Language Models (LLMs) have galvanised public and scientific debates over whether artificial systems might one day be conscious. Prevailing optimism is often grounded in computational functionalism: the assumption that consciousness is determined solely by the right pattern of information processing, independent of the physical substrate. Opposing this, biological naturalism insists that conscious experience is fundamentally dependent on the concrete physical processes of living systems. Despite the centrality of these positions to the artificial consciousness debate, there is currently no coherent framework that explains how biological computation differs from digital computation, and why this difference might matter for consciousness. Here, we argue that the absence of consciousness in artificial systems is not merely due to missing functional organisation but reflects a deeper divide between digital and biological modes of computation and the dynamico-structural dependencies of living organisms. Specifically, we propose that biological systems support conscious processing because they (i) instantiate scale-inseparable, substrate-dependent multiscale processing as a metabolic optimisation strategy, and (ii) alongside discrete computations, they perform continuous-valued computations due to the very nature of the fluidic substrate from which they are composed. These features - scale inseparability and hybrid computations - are not peripheral, but essential to the brain's mode of computation. In light of these differences, we outline the foundational principles of a biological theory of computation and explain why current artificial intelligence systems are unlikely to replicate conscious processing as it arises in biology.

Similar Papers
  • Research Article
  • Cite Count Icon 10
  • 10.1089/cyber.2024.0409
Psychomatics-A Multidisciplinary Framework for Understanding Artificial Minds.
  • Aug 29, 2024
  • Cyberpsychology, behavior and social networking
  • Giuseppe Riva + 4 more

Although large language models (LLMs) and other artificial intelligence systems demonstrate cognitive skills similar to humans, such as concept learning and language acquisition, the way they process information fundamentally differs from biological cognition. To better understand these differences, this article introduces Psychomatics, a multidisciplinary framework bridging cognitive science, linguistics, and computer science. It aims to delve deeper into the high-level functioning of LLMs, focusing specifically on how LLMs acquire, learn, remember, and use information to produce their outputs. To achieve this goal, Psychomatics will rely on a comparative methodology, starting from a theory-driven research question-is the process of language development and use different in humans and LLMs?-drawing parallels between LLMs and biological systems. Our analysis shows how LLMs can map and manipulate complex linguistic patterns in their training data. Moreover, LLMs can follow Grice's Cooperative principle to provide relevant and informative responses. However, human cognition draws from multiple sources of meaning, including experiential, emotional, and imaginative facets, which transcend mere language processing and are rooted in our social and developmental trajectories. Moreover, current LLMs lack physical embodiment, reducing their ability to make sense of the intricate interplay between perception, action, and cognition that shapes human understanding and expression. Ultimately, Psychomatics holds the potential to yield transformative insights into the nature of language, cognition, and intelligence, both artificial and biological. Moreover, by drawing parallels between LLMs and human cognitive processes, Psychomatics can inform the development of more robust and human-like artificial intelligence systems.

  • Research Article
  • Cite Count Icon 1
  • 10.1093/ofid/ofae631.609
P-408. Utility of a Large Language Model for Identifying Central Line-Associated Bloodstream Infections (CLABSI) Using Real Clinical Data at Stanford Health Care
  • Jan 29, 2025
  • Open Forum Infectious Diseases
  • Guillermo Rodriguez-Nava + 4 more

Background Central line-associated bloodstream infections (CLABSI) surveillance can be subjective and time-consuming. Large language models (LLMs) are advanced artificial intelligence systems with potential to assist healthcare professionals in classification tasks. Stanford Health Care recently implemented one of the first secure LLMs, powered by OpenAI’s GPT 4.0, cleared for sensitive health data. We assessed its performance in classifying CLABSI cases.Figure 1:Confusion Matrix of LLM Performance in CLABSI Classification. Methods We selected 40 patients flagged by our surveillance system for CLABSI review from November 2023–March 2024: 20 CLABSIs, consecutively identified, and 20 not-CLABSIs (randomly sampled). We prompted the LLM to determine if patients met the NHSN definition for CLABSI and provided the blood culture results that triggered the alert and the last 2 progress notes from the primary care team at the infection window end (within 3 days after the first positive test). We compared the secure LLM's determinations with those of infection preventionists.Table 1.Cases in which the LLM did not agree with IP assessment for CLABSI.*Community-onset: Blood cultures obtained within 2 days of admission.+NHSN guidelines list Fusobacterium nucleatum as an MBI organism (https://www.cdc.gov/nhsn/pdfs/pscmanual/17pscnosinfdef_current.pdf)Abbreviations: BSI, bloodstream infection; CLABSI, central line-associated infection; CoNS, coagulase-negative Staphylococci; ESBL, extended-spectrum beta lactamase; HIDA scan, IP, infection preventionist, LLM, large language model; MBI, mucosal barrier injury; MSSA, methicillin-susceptible Staphylococcus aureus; NHSN, National Healthcare Safety Network. Results Across 20 CLABSI-positive and 20 CLABSI-negative cases reviewed, the LLM accurately identified 16 of 20 CLABSIs and 7 of 20 not CLABSIs. The sensitivity was 80% (95% CI 57.6%–92.9%), specificity was 35% (95% CI 33.3%–86.5%), and the agreement rate was 57.5% (95% CI 41.2%–73.3%). Among 17 discordant cases, 11 involved clinical data available in the chart but unavailable to the LLM—admission information (4 false-positives), matching organisms (4 false-positives), and central line or symptom status (2 false-negatives, 1 false-positive). If this information was available to the LLM, we expect an adjusted sensitivity of 90% (18/20) and adjusted specificity of 80% (16/20). The remaining discordant cases involved misclassifications of organisms and incorrect identification of infection sources by the LLM. The mean review time by infection preventionists was 75 minutes (SD 48.7 minutes) compared to 5 minutes using the LLM. Conclusion An LLM not specifically trained for CLABSI classification showed high sensitivity using limited patient data. LLM case review required 5 minutes, versus 1 hour for traditional review. These results suggest LLMs could serve as a "first-pass" screening tool for CLABSI detection, helping infection preventionists narrow records needing human review. Disclosures All Authors: No reported disclosures

  • Research Article
Automated Grading for Efficiently Evaluating the Dual-Use Biological Capabilities of Large Language Models.
  • Sep 1, 2025
  • Rand health quarterly
  • Bria Persaud + 6 more

Advances in the biological knowledge and reasoning capabilities of large language models (LLMs) have sparked interest in assessing the potential of LLMs to facilitate emerging biological risks. The authors evaluated LLMs' abilities to answer knowledge-based questions and generate protocols that explain how to perform common laboratory techniques that could be used in the creation of proxies for biological threats. Because LLM evaluation approaches that rely on human subject-matter experts are often costly and time-intensive, the authors introduced an automated systematic and scalable method for evaluating the ability of LLMs to generate protocols for laboratory techniques. The results presented confirm prior work indicating that LLMs possess knowledge of the biological sciences. This study is intended to inform evaluators of artificial intelligence systems, academics, technical experts, and policymakers on techniques for examining the risks of the convergence of LLMs and biological threats.

  • Research Article
  • 10.2478/bile-2025-0010
Artificial Intelligence and Consciousness: Limits and Modern Perspectives
  • Dec 1, 2025
  • Biometrical Letters
  • Laura Slebioda

Summary This paper provides a review of selected concepts concerning consciousness, intelligence and artificial intelligence, focusing on their interrelations and interpretative limitations. The aim of the paper is to organize key definitions and viewpoints, and to highlight central issues related to the question of whether conscious machines can ever emerge. Consciousness is often defined as subjective experience, as the capacity for reflection on one’s own mental states, or as an emergent property of complex biological systems. Intelligence, on the other hand, is interpreted as the ability to learn, solve problems, adapt to changing conditions, and control cognitive processes. The development of computational technologies has given rise to weak artificial intelligence, encompassing algorithmic and machine learning systems that can model and predict patterns with high precision. Within this category, generative artificial intelligence, represented by large language models, demonstrates impressive linguistic capabilities but lacks genuine understanding – a feature associated with strong AI. The paper discusses whether computational processes can be equated with real thinking, referring to Gödel’s incompleteness theorems, Searle’s Chinese Room argument, as well as the Turing Test. This review contributes by integrating classical philosophical arguments with a comparative evaluation of contemporary language models (GPT-5, Gemini 2.5, DeepSeek-V3.2), examining their responses to Gödelian questions and reasoning tasks. The analysis indicates that, despite significant progress in building artificial intelligence systems, the question of their potential consciousness remains unresolved and continues to be a subject of profound philosophical debate.

  • Research Article
  • Cite Count Icon 13
  • 10.2196/70535
Unveiling the Potential of Large Language Models in Transforming Chronic Disease Management: Mixed Methods Systematic Review.
  • Apr 16, 2025
  • Journal of medical Internet research
  • Caixia Li + 7 more

Chronic diseases are a major global health burden, accounting for nearly three-quarters of the deaths worldwide. Large language models (LLMs) are advanced artificial intelligence systems with transformative potential to optimize chronic disease management; however, robust evidence is lacking. This review aims to synthesize evidence on the feasibility, opportunities, and challenges of LLMs across the disease management spectrum, from prevention to screening, diagnosis, treatment, and long-term care. Following the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analysis) guidelines, 11 databases (Cochrane Central Register of Controlled Trials, CINAHL, Embase, IEEE Xplore, MEDLINE via Ovid, ProQuest Health & Medicine Collection, ScienceDirect, Scopus, Web of Science Core Collection, China National Knowledge Internet, and SinoMed) were searched on April 17, 2024. Intervention and simulation studies that examined LLMs in the management of chronic diseases were included. The methodological quality of the included studies was evaluated using a rating rubric designed for simulation-based research and the risk of bias in nonrandomized studies of interventions tool for quasi-experimental studies. Narrative analysis with descriptive figures was used to synthesize the study findings. Random-effects meta-analyses were conducted to assess the pooled effect estimates of the feasibility of LLMs in chronic disease management. A total of 20 studies examined general-purpose (n=17) and retrieval-augmented generation-enhanced LLMs (n=3) for the management of chronic diseases, including cancer, cardiovascular diseases, and metabolic disorders. LLMs demonstrated feasibility across the chronic disease management spectrum by generating relevant, comprehensible, and accurate health recommendations (pooled accurate rate 71%, 95% CI 0.59-0.83; I2=88.32%) with retrieval-augmented generation-enhanced LLMs having higher accuracy rates compared to general-purpose LLMs (odds ratio 2.89, 95% CI 1.83-4.58; I2=54.45%). LLMs facilitated equitable information access; increased patient awareness regarding ailments, preventive measures, and treatment options; and promoted self-management behaviors in lifestyle modification and symptom coping. Additionally, LLMs facilitate compassionate emotional support, social connections, and health care resources to improve the health outcomes of chronic diseases. However, LLMs face challenges in addressing privacy, language, and cultural issues; undertaking advanced tasks, including diagnosis, medication, and comorbidity management; and generating personalized regimens with real-time adjustments and multiple modalities. LLMs have demonstrated the potential to transform chronic disease management at the individual, social, and health care levels; however, their direct application in clinical settings is still in its infancy. A multifaceted approach that incorporates robust data security, domain-specific model fine-tuning, multimodal data integration, and wearables is crucial for the evolution of LLMs into invaluable adjuncts for health care professionals to transform chronic disease management. PROSPERO CRD42024545412; https://www.crd.york.ac.uk/PROSPERO/view/CRD42024545412.

  • Research Article
  • 10.12731/2658-4034-2024-15-1-444
THE FUTURE OF FOREIGN LANGUAGE TEACHING IN THE CONTEXT OF THE 5TH COGNITIVE REVOLUTION
  • Feb 29, 2024
  • Russian Journal of Education and Psychology
  • Irina S Volegzhanina

Purpose. The article addresses the issue of identifying the possible ways of foreign language teaching transformation, which is crucial in the face of a technological paradigm exponential shift experienced by the contemporary society. The subject of this research is changes in this process within the expecting 5th Cognitive Revolution. The author aims to show the prospects of professional development of a foreign language department instructor in view of the mass implementation of artificial intelligence systems based on Large Language Models. Methodology. The basis of the research is the personalised and symbolic approaches, as well as the principle of interdisciplinarity. Results. The results and novelty of this study are that the author specifies the periodisation of cognitive revolutions applied to education by analysing Russian and foreign researches. It is the technologies of the 5th Cognitive Revolution, distinguished by the synergy of human and machine intelligence, will determine the nature of foreign language teaching transformation in universities. The author assumes that the activity of instructors working for foreign language departments will be the most sensitive to professional turbulence due to the achievements of world research in the field of Large Language Models. The professional development of instructors appears to be related to their ability to interact with artificial systems with emotional intelligence and multimodal behaviour, such as Artificial Intelligent tutors and non-biological department employees within the hybrid education environment of a university. Practical implications. The results of this research can be applied in the development of technologies and methods of teaching foreign languages at universities.

  • Preprint Article
  • 10.20944/preprints202504.1933.v1
Evaluating Logical Reasoning Ability of Large Language Models
  • Apr 23, 2025
  • Preprints.org
  • Emunah Chan

Large language models (LLMs) such as ChatGPT and DeepSeek have recently made significant progress in natural language processing, demonstrating reasoning ability close to human intelligence. This has sparked considerable research interest since reasoning is a hallmark of human intelligence that is widely considered missed in artificial intelligence systems. Due to the large size of these models, evaluation of LLMs’ reasoning ability is largely empirical. Creating datasets to evaluate the reasoning ability of LLMs is an active research area. A key open question is whether LLMs reason or simply recite memorized texts they have encountered during their training phase. This work conducts simple experiments using Cheryl’s Birthday Puzzle and Cheryl’s Age Puzzle to investigate whether LLMs recite or reason and discovers that LLMs tend to recite memorized answers for well-known questions, which appear frequently on the internet. As a result, to accurately evaluate the reasoning ability of LLMs, it is essential to create new datasets to ensure that LLMs truly use their reasoning ability to generate responses to the presented questions. In view of the finding, this work proposes a new dataset comprising of questions requiring semantic and deductive logical reasoning skills to elicit reasoning ability from LLMs. The proposed evaluation framework has several desirable properties, including resilience to training data contamination, ease of response verification, extensibility, usefulness and automated test case generation. This work applies the proposed dataset to evaluate the reasoning ability of state-of-the-art LLMs, including GPT-3, GPT-4, Llama-3.1, Germini-1.5, Claude-3.5 and DeepSeek-V3. A significant observation is that most LLMs achieve a performance independent of question complexity. This suggests that they reason more like an algorithm than human intelligence. In contrast, DeepSeek-V3 resembles human reasoning behaviour most among all the tested LLMs. Finally, an algorithm to automatically generate the dataset of logical reasoning questions is presented.

  • Research Article
  • Cite Count Icon 3
  • 10.3390/app15031103
Bidirectional Semantic Communication Between Humans and Machines Based on Data, Information, Knowledge, Wisdom, and Purpose Artificial Consciousness
  • Jan 22, 2025
  • Applied Sciences
  • Yingtian Mei + 1 more

Large language models (LLMs) and other artificial intelligence systems are trained using extensive DIKWP resources (data, information, knowledge, wisdom, purpose). These introduce uncertainties when applied to individual users in a collective semantic space. Traditional methods often lead to introducing new concepts rather than a proper understanding based on the semantic space. When dealing with complex problems or insufficient context, the limitations in conceptual cognition become even more evident. To address this, we take pediatric consultation as a scenario, using case simulations to specifically discuss unidirectional communication impairments between doctors and infant patients and the bidirectional communication biases between doctors and infant parents. We propose a human–machine interaction model based on DIKWP artificial consciousness. For the unidirectional communication impairment, we use the example of an infant’s perspective in recognizing and distinguishing objects, simulating the cognitive process of the brain from non-existence to existence, transitioning from cognitive space to semantic space, and generating corresponding semantics for DIKWP, abstracting concepts, and labels. For the bidirectional communication bias, we use the interaction between infant parents and doctors as an example, mapping the interaction process to the DIKWP transformation space and addressing the DIKWP 3-No problem (incompleteness, inconsistency, and imprecision) for both parties. We employ a purpose-driven DIKWP transformation model to solve part of the 3-No problem. Finally, we comprehensively validate the proposed method (DIKWP-AC). We first analyze, evaluate, and compare the DIKWP transformation calculations and processing capabilities, and then compare it with seven mainstream large models. The results show that DIKWP-AC performs well. Constructing a novel cognitive model reduces the information gap in human–machine interactions, promotes mutual understanding and communication, and provides a new pathway for achieving more efficient and accurate artificial consciousness interactions.

  • Discussion
  • Cite Count Icon 7
  • 10.1016/j.lanmic.2024.07.017
The future of large language models in fighting emerging outbreaks: lights and shadows
  • Jul 30, 2024
  • The Lancet Microbe
  • Alberto Rizzo + 2 more

The future of large language models in fighting emerging outbreaks: lights and shadows

  • Conference Article
  • Cite Count Icon 4
  • 10.1162/isal_a_00759
From Text to Life: On the Reciprocal Relationship between Artificial Life and Large Language Models
  • Jan 1, 2024
  • Eleni Nisioti + 9 more

Large Language Models (LLMs) have taken the field of AI by storm, but their adoption in the field of Artificial Life (ALife) has been, so far, relatively reserved. In this work we investigate the potential synergies betweens LLMs and ALife, drawing on a large body of research in the two fields. We explore the potential of LLMs as tools for ALife research, for example, as operators for evolutionary computation or the generation of open-ended environments. Reciprocally, principles of ALife, such as self-organization, collective intelligence and evolvability can provide an opportunity for shaping the development and functionalities of LLMs, leading to more adaptive and responsive models. By investigating this dynamic interplay, the paper aims to inspire innovative crossover approaches for both ALife and LLM research. Along the way, we examine the extent to which LLMs appear to increasingly exhibit properties such as emergence or collective intelligence, expanding beyond their original goal of generating text, and potentially redefining our perception of lifelike intelligence in artificial systems.

  • Research Article
  • 10.1016/j.ejso.2026.111741
The utility of large language models in oncological multidisciplinary team meetings: A systematic review.
  • Mar 6, 2026
  • European journal of surgical oncology : the journal of the European Society of Surgical Oncology and the British Association of Surgical Oncology
  • Swetha Prabhakaran + 3 more

Large language models (LLMs) have emerged in recent years as innovative artificial intelligence systems with early potential in clinical decision-making. This is the first systematic review to evaluate LLMs' oncological decision-making and compare their treatment recommendations to "gold standard" oncological multi-disciplinary team (MDT) decision-making. PubMed, EMBASE and Medline databases were last searched on 20th January in line with PRISMA guidelines. All relevant peer-reviewed publications comparing LLM and MDT treatment recommendations in patients with cancer were included. Studies using fictional cases, case reports, and conference proceedings were excluded. Modified QUADAS-2 tool was used for bias assessment. The primary outcome was the concordance between LLM and MDT treatment recommendations. 34 publications met the inclusion criteria with a total of 3513 patient cases included in this review. Studies were highly heterogenous with regards to study design, sample size, cancers studied, and LLM models evaluated, among others. Concordance rates ranged from 16 to 100% across all studies. Highest concordance rates were noted in prostate cancer cases, where the LLM was directed to incorporate established international guidelines in decision-making. One third of studies exhibited a high level of bias. Limitations to LLM decision-making include overtreatment of frail patients, lack of reproducibility, insufficient niche knowledge, occasional life-threatening recommendations, and medico-legal issues including privacy and confidentiality. LLMs may be capable of generating appropriate oncological treatment recommendations, but early outcomes are inconsistent, and conflicting across the various studies with regards to safety. Robust prospective comparative studies are yet needed to better determine their utility in this setting.

  • Research Article
  • Cite Count Icon 38
  • 10.1016/j.hcc.2025.100300
On protecting the data privacy of Large Language Models (LLMs) and LLM agents: A literature review
  • Jun 1, 2025
  • High-Confidence Computing
  • Biwei Yan + 6 more

On protecting the data privacy of Large Language Models (LLMs) and LLM agents: A literature review

  • Research Article
  • 10.17223/17267080/96/2
Влияние больших языковых моделей LLM (ChatGPT) на креативность студентов
  • Jan 1, 2025
  • Sibirskiy Psikhologicheskiy Zhurnal
  • Valery I Kabrin + 5 more

The active integration of generative artificial intelligence systems into the field of education and the mixed results of their use in the socio-humanitarian sphere raise questions about the quality of their impact on creativity as a universal competency of personal productivity. This study focuses on exploring the influence of large language models (LLMs), such as ChatGPT, on student creativity. Objective. To examine the impact of LLMs (ChatGPT) on student creativity in solving academic tasks from the perspective of objective assessment and subjective self-evaluation of its manifestation. Materials and Methods. The sample consisted of two groups of first-year philosophy students - an experimental group (EG) and a control group (CG), each comprising of 30 students. Both groups completed four creative tasks as part of their academic coursework; the EG used LLMs (ChatGPT) for idea generation and information retrieval, while the CG did not use LLMs. The results were assessed using five creativity criteria based on J.P. Guilford and E.P. Torrance's models (uniqueness, non-obviousness, novelty of approach, transformation, and quantity of ideas), a "Self-Assessment of Creativity" questionnaire with open-ended questions, and the MAI-32 "Metacognitive Awareness Inventory" test. Qualitative analysis was conducted using content analysis; statistical processing was performed using variance analysis; and effect sizes (Cohen's d and Hedges' g) were calculated to assess the significance of differences. Results. The hypothesis that LLMs positively influence student creativity was not supported. Objective assessment showed higher creativity results in the CG, which did not use LLMs. Moreover, in two tasks, a significant negative effect was observed, possibly due to fixation on the technique of working with the tool. Paradoxically, subjective self-assessment of creativity was statistically significantly higher in the EG, which used LLMs, resembling a "placebo effect," as the process of working with LLMs was perceived as more creative and productive at the selfperception level, despite objectively lower results.Conclusion. The active imagination of students without the use of LLMs yields objectively higher results across all creativity criteria and induces critical self-reflection. To enhance the effectiveness of LLMs, it is advisable to introduce them as a "timely prompt" only after intellectual and emotional frustration arises, which will stimulate the awakening of creative intuition.

  • Research Article
  • 10.3348/kjr.2025.1045
Evaluating the Accuracy and Diagnostic Reasoning of Multimodal Large Language Models in Interpreting Neuroradiology Cases From RadioGraphics.
  • Jan 1, 2026
  • Korean journal of radiology
  • Pae Sun Suh + 6 more

To evaluate the accuracy and reasoning capabilities of large multimodal language models compared with those of neuroradiology subspecialty-trained radiologists in neuroradiology case interpretation. This experimental study used custom-made 401 radiologic quizzes derived from articles published in RadioGraphics covering neuroradiology and head and neck topics (October 2020 to February 2024). We prompted the GPT-4 Turbo with Vision (GPT-4V), GPT-4 Omni, Gemini Flash, and Claude models to provide the top three differential diagnoses with a rationale and describe examination characteristics such as imaging modality, sequence, use of contrast, image plane, and body part. The temperature was adjusted to 0 and 1 (T1). Two neuroradiologists answered the same questions. The accuracies of the large language models (LLMs) and the neuroradiologists were compared using generalized estimating equations. Three neuroradiologists assessed the rationale provided by the LLMs for their differential diagnoses using four-point scales, separately for specific lesion locations and imaging findings, and evaluated the presence of hallucinations and the overall acceptability of the responses. Top-3 accuracy (i.e., correct answers present among top-3 differential diagnoses) of LLMs ranged from 29.9% (120 of 401) to 49.4% (198 of 401, obtained with GPT-4V in the T1 setting), while radiologists achieved 80.3% (322 of 401) and 68.3% (274 of 401), respectively (P < 0.001). Regarding the rationale for differential diagnoses, GPT-4V (T1) accurately identified both the specific lesion location and imaging findings in 30.7% (123 of 401) and 12.9% (16 of 124) of cases without textual clinical history. Hallucinations occurred in 4.5% (18 of 401), and only 29.4% (118 of 401) of the LLM-generated analyses were deemed acceptable. GPT-4V (T1) demonstrated high accuracy in identifying the imaging modality (97.4% [800 of 821]) and scanned body parts (92.2% [756 of 820]). LLMs remarkably underperformed compared with neuroradiologists and showed unsatisfactory reasoning for their differential diagnoses, with performance declining further in cases without textual input of clinical history. These findings highlight the limitations of current multimodal LLMs in neuroradiological interpretation and their reliance on text input.

  • Research Article
  • Cite Count Icon 20
  • 10.1136/bmjonc-2025-000759
Large language models in oncology: a review.
  • May 1, 2025
  • BMJ oncology
  • David Chen + 7 more

Large language models (LLMs) have demonstrated emergent human-like capabilities in natural language processing, leading to enthusiasm about their integration in healthcare environments. In oncology, where synthesising complex, multimodal data is essential, LLMs offer a promising avenue for supporting clinical decision-making, enhancing patient care, and accelerating research. This narrative review aims to highlight the current state of LLMs in medicine; applications of LLMs in oncology for clinicians, patients, and translational research; and future research directions. Clinician-facing LLMs enable clinical decision support and enable automated data extraction from electronic health records and literature to inform decision-making. Patient-facing LLMs offer the potential for disseminating accessible cancer information and psychosocial support. However, LLMs face limitations that must be addressed before clinical adoption, including risks of hallucinations, poor generalisation, ethical concerns, and scope integration. We propose the incorporation of LLMs within compound artificial intelligence systems to facilitate adoption and efficiency in oncology. This narrative review serves as a non-technical primer for clinicians to understand, evaluate, and participate as active users who can inform the design and iterative improvement of LLM technologies deployed in oncology settings. While LLMs are not intended to replace oncologists, they can serve as powerful tools to augment clinical expertise and patient-centred care, reinforcing their role as a valuable adjunct in the evolving landscape of oncology.

Save Icon
Up Arrow
Open/Close
Notes

Save Important notes in documents

Highlight text to save as a note, or write notes directly

You can also access these Documents in Paperpal, our AI writing tool

Powered by our AI Writing Assistant