Attention economics, artificial intelligence, and the future of the planning profession
Attention economics concerns itself with the study of the allocation of attention, conceptualized as a scarce resource. In this essay I relate fundamental insights from attention economics to recent advances in a specific type of artificial intelligence known as Large Language Models (LLMs), such as OpenAIs GPT. I argue that the development leap known as the ‘LLM revolution’ can be expected to have a fundamental impact on planning practice. However, we should be careful not to stare ourselves blind at the expectation that LLMs will necessarily always deliver superior ‘intelligence’. Rather, it may be more helpful to think of them as providing relatively cheap synthetic competent attention, considering that attention scarcity rather than information/knowledge scarcity is the critical bottleneck within many contexts of contemporary planning practice. The essay attempts to tease out the implications of such a perspective, with a particular focus on what this could mean for the future of the planning profession.
- Research Article
8
- 10.1287/ijds.2023.0007
- Apr 1, 2023
- INFORMS Journal on Data Science
How Can <i>IJDS</i> Authors, Reviewers, and Editors Use (and Misuse) Generative AI?
- Research Article
16
- 10.1162/daed_e_01897
- May 1, 2022
- Daedalus
This dialogue is from an early scene in the 2014 film Ex Machina, in which Nathan has invited Caleb to determine whether Nathan has succeeded in creating artificial intelligence.1 The achievement of powerful artificial general intelligence has long held a grip on our imagination not only for its exciting as well as worrisome possibilities, but also for its suggestion of a new, uncharted era for humanity. In opening his 2021 BBC Reith Lectures, titled "Living with Artificial Intelligence," Stuart Russell states that "the eventual emergence of general-purpose artificial intelligence [will be] the biggest event in human history."2Over the last decade, a rapid succession of impressive results has brought wider public attention to the possibilities of powerful artificial intelligence. In machine vision, researchers demonstrated systems that could recognize objects as well as, if not better than, humans in some situations. Then came the games. Complex games of strategy have long been associated with superior intelligence, and so when AI systems beat the best human players at chess, Atari games, Go, shogi, StarCraft, and Dota, the world took notice. It was not just that Als beat humans (although that was astounding when it first happened), but the escalating progression of how they did it: initially by learning from expert human play, then from self-play, then by teaching themselves the principles of the games from the ground up, eventually yielding single systems that could learn, play, and win at several structurally different games, hinting at the possibility of generally intelligent systems.3Speech recognition and natural language processing have also seen rapid and headline-grabbing advances. Most impressive has been the emergence recently of large language models capable of generating human-like outputs. Progress in language is of particular significance given the role language has always played in human notions of intelligence, reasoning, and understanding. While the advances mentioned thus far may seem abstract, those in driverless cars and robots have been more tangible given their embodied and often biomorphic forms. Demonstrations of such embodied systems exhibiting increasingly complex and autonomous behaviors in our physical world have captured public attention.Also in the headlines have been results in various branches of science in which AI and its related techniques have been used as tools to advance research from materials and environmental sciences to high energy physics and astronomy.4 A few highlights, such as the spectacular results on the fifty-year-old protein-folding problem by AlphaFold, suggest the possibility that AI could soon help tackle science's hardest problems, such as in health and the life sciences.5While the headlines tend to feature results and demonstrations of a future to come, AI and its associated technologies are already here and pervade our daily lives more than many realize. Examples include recommendation systems, search, language translators - now covering more than one hundred languages - facial recognition, speech to text (and back), digital assistants, chatbots for customer service, fraud detection, decision support systems, energy management systems, and tools for scientific research, to name a few. In all these examples and others, AI-related techniques have become components of other software and hardware systems as methods for learning from and incorporating messy real-world inputs into inferences, predictions, and, in some cases, actions. As director of the Future of Humanity Institute at the University of Oxford, Nick Bostrom noted back in 2006, "A lot of cutting-edge AI has filtered into general applications, often without being called AI because once something becomes useful enough and common enough it's not labeled AI anymore."6As the scope, use, and usefulness of these systems have grown for individual users, researchers in various fields, companies and other types of organizations, and governments, so too have concerns when the systems have not worked well (such as bias in facial recognition systems), or have been misused (as in deepfakes), or have resulted in harms to some (in predicting crime, for example), or have been associated with accidents (such as fatalities from self-driving cars).7Dædalus last devoted a volume to the topic of artificial intelligence in 1988, with contributions from several of the founders of the field, among others. Much of that issue was concerned with questions of whether research in AI was making progress, of whether AI was at a turning point, and of its foundations, mathematical, technical, and philosophical-with much disagreement. However, in that volume there was also a recognition, or perhaps a rediscovery, of an alternative path toward AI - the connectionist learning approach and the notion of neural nets-and a burgeoning optimism for this approach's potential. Since the 1960s, the learning approach had been relegated to the fringes in favor of the symbolic formalism for representing the world, our knowledge of it, and how machines can reason about it. Yet no essay captured some of the mood at the time better than Hilary Putnam's "Much Ado About Not Very Much." Putnam questioned the Dædalus issue itself: "Why a whole issue of Dædalus? Why don't we wait until AI achieves something and then have an issue?" He concluded:This volume of Dædalus is indeed the first since 1988 to be devoted to artificial intelligence. This volume does not rehash the same debates; much else has happened since, mostly as a result of the success of the machine learning approach that was being rediscovered and reimagined, as discussed in the 1988 volume. This issue aims to capture where we are in AI's development and how its growing uses impact society. The themes and concerns herein are colored by my own involvement with AI. Besides the television, films, and books that I grew up with, my interest in AI began in earnest in 1989 when, as an undergraduate at the University of Zimbabwe, I undertook a research project to model and train a neural network.9 I went on to do research on AI and robotics at Oxford. Over the years, I have been involved with researchers in academia and labs developing AI systems, studying AI's impact on the economy, tracking AI's progress, and working with others in business, policy, and labor grappling with its opportunities and challenges for society.10The authors of the twenty-five essays in this volume range from AI scientists and technologists at the frontier of many of AI's developments to social scientists at the forefront of analyzing AI's impacts on society. The volume is organized into ten sections. Half of the sections are focused on AI's development, the other half on its intersections with various aspects of society. In addition to the diversity in their topics, expertise, and vantage points, the authors bring a range of views on the possibilities, benefits, and concerns for society. I am grateful to the authors for accepting my invitation to write these essays.Before proceeding further, it may be useful to say what we mean by artificial intelligence. The headlines and increasing pervasiveness of AI and its associated technologies have led to some conflation and confusion about what exactly counts as AI. This has not been helped by the current trend-among researchers in science and the humanities, startups, established companies, and even governments-to associate anything involving not only machine learning, but data science, algorithms, robots, and automation of all sorts with AI. This could simply reflect the hype now associated with AI, but it could also be an acknowledgment of the success of the current wave of AI and its related techniques and their wide-ranging use and usefulness. I think both are true; but it has not always been like this. In the period now referred to as the AI winter, during which progress in AI did not live up to expectations, there was a reticence to associate most of what we now call AI with AI.Two types of definitions are typically given for AI. The first are those that suggest that it is the ability to artificially do what intelligent beings, usually human, can do. For example, artificial intelligence is:The human abilities invoked in such definitions include visual perception, speech recognition, the capacity to reason, solve problems, discover meaning, generalize, and learn from experience. Definitions of this type are considered by some to be limiting in their human-centricity as to what counts as intelligence and in the benchmarks for success they set for the development of AI (more on this later). The second type of definitions try to be free of human-centricity and define an intelligent agent or system, whatever its origin, makeup, or method, as:This type of definition also suggests the pursuit of goals, which could be given to the system, self-generated, or learned.13 That both types of definitions are employed throughout this volume yields insights of its own.These definitional distinctions notwithstanding, the term AI, much to the chagrin of some in the field, has come to be what cognitive and computer scientist Marvin Minsky called a "suitcase word."14 It is packed variously, depending on who you ask, with approaches for achieving intelligence, including those based on logic, probability, information and control theory, neural networks, and various other learning, inference, and planning methods, as well as their instantiations in software, hardware, and, in the case of embodied intelligence, systems that can perceive, move, and manipulate objects.Three questions cut through the discussions in this volume: 1) Where are we in AI's development? 2) What opportunities and challenges does AI pose for society? 3) How much about AI is really about us?Notions of intelligent machines date all the way back to antiquity.15 Philosophers, too, among them Hobbes, Leibnitz, and Descartes, have been dreaming about AI for a long time; Daniel Dennett suggests that Descartes may have even anticipated the Turing Test.16 The idea of computation-based machine intelligence traces to Alan Turing's invention of the universal Turing machine in the 1930s, and to the ideas of several of his contemporaries in the mid-twentieth century. But the birth of artificial intelligence as we know it and the use of the term is generally attributed to the now famed Dartmouth summer workshop of 1956. The workshop was the result of a proposal for a two-month summer project by John McCarthy, Marvin Minsky, Nathaniel Rochester, and Claude Shannon whereby "An attempt will be made to find how to make machines use language, form abstractions and concepts, solve kinds of problems now reserved for humans, and improve themselves."17In their respective contributions to this volume, "From So Simple a Beginning: Species of Artificial Intelligence" and "If We Succeed," and in different but complementary ways, Nigel Shadbolt and Stuart Russell chart the key ideas and developments in AI, its periods of excitement as well as the aforementioned AI winters. The current AI spring has been underway since the 1990s, with headline-grabbing breakthroughs appearing in rapid succession over the last ten years or so: a period that Jeffrey Dean describes in the title of his essay as a "golden decade," not only for the pace of AI development but also its use in a wide range of sectors of society, as well as areas of scientific research.18 This period is best characterized by the approach to achieve artificial intelligence through learning from experience, and by the success of neural networks, deep learning, and reinforcement learning, together with methods from probability theory, as ways for machines to learn.19A brief history may be useful here: In the 1950s, there were two dominant visions of how to achieve machine intelligence. One vision was to use computers to create a logic and symbolic representation of the world and our knowledge of it and, from there, create systems that could reason about the world, thus exhibiting intelligence akin to the mind. This vision was most espoused by Allen Newell and Hebert Simon, along with Marvin Minsky and others. Closely associated with it was the "heuristic search" approach that supposed intelligence was essentially a problem of exploring a space of possibilities for answers. The second vision was inspired by the brain, rather than the mind, and sought to achieve intelligence by learning. In what became known as the connectionist approach, units called perceptrons were connected in ways inspired by the connection of neurons in the brain. At the time, this approach was most associated with Frank Rosenblatt. While there was initial excitement about both visions, the first came to dominate, and did so for decades, with some successes, including so-called expert systems.Not only did this approach benefit from championing by its advocates and plentiful funding, it came with the suggested weight of a long intellectual tradition-exemplified by Descartes, Boole, Frege, Russell, and Church, among others-that sought to manipulate symbols and to formalize and axiomatize knowledge and reasoning. It was only in the late 1980s that interest began to grow again in the second vision, largely through the work of David Rumelhart, Geoffrey Hinton, James McClelland, and others. The history of these two visions and the associated philosophical ideas are discussed in Hubert Dreyfus and Stuart Dreyfus's 1988 Dædalus essay "Making a Mind Versus Modeling the Brain: Artificial Intelligence Back at a Branchpoint."20 Since then, the approach to intelligence based on learning, the use of statistical methods, back-propagation, and training (supervised and unsupervised) has come to characterize the current dominant approach.Kevin Scott, in his essay "I Do Not Think It Means What You Think It Means: Artificial Intelligence, Cognitive Work & Scale," reminds us of the work of Ray Solomonoff and others linking information and probability theory with the idea of machines that can not only learn, but compress and potentially generalize what they learn, and the emerging realization of this in the systems now being built and those to come. The success of the machine learning approach has benefited from the boon in the availability of data to train the algorithms thanks to the growth in the use of the Internet and other applications and services. In research, the data explosion has been the result of new scientific instruments and observation platforms and data-generating breakthroughs, for example, in astronomy and in genomics. Equally important has been the co-evolution of the software and hardware used, especially chip architectures better suited to the parallel computations involved in data- and compute-intensive neural networks and other machine learning approaches, as Dean discusses.Several authors delve into progress in key subfields of AI.21 In their essay, "Searching for Computer Vision North Stars," Fei-Fei Li and Ranjay Krishna chart developments in machine vision and the creation of standard data sets such as ImageNet that could be used for benchmarking performance. In their respective essays "Human Language Understanding & Reasoning" and "The Curious Case of Commonsense Intelligence," Chris Manning and Yejin Choi discuss different eras and ideas in natural language processing, including the recent emergence of large language models comprising hundreds of billions of parameters and that use transformer architectures and self-supervised learning on vast amounts of data.22 The resulting pretrained models are impressive in their capacity to take natural language prompts for which they have not been trained specifically and generate human-like outputs, not only in natural language, but also images, software code, and more, as Mira Murati discusses and illustrates in "Language & Coding Creativity." Some have started to refer to these large language models as foundational models in that once they are trained, they are adaptable to a wide range of tasks and outputs.23 But despite their unexpected performance, these large language models are still early in their development and have many shortcomings and limitations that are highlighted in this volume and elsewhere, including by some of their developers.24In "The Machines from Our Future," Daniela Rus discusses the progress in robotic systems, including advances in the underlying technologies, as well as in their integrated design that enables them to operate in the physical world. She highlights the limitations in the "industrial" approaches used thus far and suggests new ways of conceptualizing robots that draw on insights from biological systems. In robotics, as in AI more generally, there has always been a tension as to whether to copy or simply draw inspiration from how humans and other biological organisms achieve intelligent behavior. Elsewhere, AI researcher Demis Hassabis and colleagues have explored how neuroscience and AI learn from and inspire each other, although so far more in one than the other, as and have the success of the current approaches to AI, there are still many shortcomings and as well as problems in It is useful to on one such as when AI does not as or or or that can to or when it on or information about the world, or when it has such as of all of which can to a of public shortcomings have captured the attention of the wider public and as well as among there is an on AI and In recent years, there has been a of to principles and approaches to AI, as well as involving and such as the on AI, that to best important has been the of with to and - in the and developing AI in both and as has been well in recent This is an important in its own but also with to the of the resulting AI and, in its intersections with more the other there are limitations and problems associated with the that AI is not capable of if could to more more or more general AI. In their Turing deep learning and Geoffrey took of where deep learning and highlighted its current such as the with In the case of natural language processing, Manning and Choi the challenges in and despite the of large language Elsewhere, and have the notion that large language models do anything learning, or In & of in a and discuss the problems in systems, the as how to reason about other their systems, and well as challenges in both and especially when the include both humans and Elsewhere, and others a useful of the problems in there is a growing among many that we do not have for the of AI systems, especially as they become more capable and the of use although AI and its related techniques are to be powerful tools for research in science, as examples in this volume and recent examples in which AI not only help results but also by design and become what some have AI to science and and to and challenges for the possibility that more powerful AI could to new in science, as well as progress in some of challenges and has long been a key for many at the frontier of AI research to more capable the of each of AI, the of more general problems that to the possibility of more capable AI learning, reasoning, of and and of these and other problems that could to more capable systems the of whether current characterized by deep learning, the of and and more foundational and and reinforcement or whether different approaches are in such as cognitive agent approaches or or based on logic and probability theory, to name a few. whether and what of approaches be the AI is but many the current along with of and learning architectures have to their about the of the current approaches is associated with the of whether artificial general intelligence can be and if how and Artificial general intelligence is in to what is called that AI and for tasks and goals, such as The development of on the other aims for more powerful AI - at as powerful as is generally to problem or and, in some the capacity to and improve as well as set and its own and the of and when will be is a for most that its achievement have and as is often in and such as A through and The to Ex and it is or there is growing among many at the frontier of AI research that we for the possibility of powerful with to and and with humans, its and use, and the possibility that of could and that we these into how we approach the development of of the research and development, and in AI is of the AI and in its what Nigel Shadbolt the of AI. This is given the for useful and applications and the for in sectors of the However, a few have made the development of their the most of these are and each of which has demonstrated results of increasing still a long way from the most discussed impact of AI and automation is on and the future of This is not In in the of the excitement about AI and and concerns about their impact on a on and the was that such technologies were important for growth and and "the that but not Most recent of this including those I have been involved have and that over time, more are than are that it is the and the and the of will the In their essay AI & and John discuss these for work and further, in & the of & to discuss the with to and and as well as the opportunities that are especially in developing In "The Turing The & of Artificial Intelligence," discusses how the use of human benchmarks in the development of AI the of AI that rather than human He that the AI's development will take in this and resulting for will on the for companies, and a that the that more will be than too much from of the and does not far enough into the future and at what AI will be capable The for AI could from of that in the is and labor and ability to are and and until automation has mostly physical and but that AI will be on more cognitive and tasks based on and, if early examples are even tasks are not of the In other are now in the world machines that that learn and that their ability to do these is to a range of problems they can will be with the range to which the human has been This was and Allen Newell in that this time could be different usually two that new labor will in which will by other humans for their own even when machines may be capable of these as well as or even better than The other is that AI will create so much and all without the for human and the of will be to for when that will the that once the first time since his creation will be with his his to use his from how to the which science and interest will have for to live and and However, most researchers that we are not to a future in which the of will and that until then, there are other and that be in the labor now and in the such as and other and how humans work increasingly capable that and John and discuss in this are not the only of the by AI. Russell a of the potentially from artificial general intelligence, once a of or ten But even we to general-purpose AI, the opportunities for companies and, for the and growth as well as from AI and its related technologies are more than to pursuit and by companies and in the development, and use of AI. At the many the is it is generally that is a in AI, as by its growth in AI research, and as highlighted in several will have for companies and given the of such technologies as discussed by and others the may in the way of approaches to AI and (such as whether they are companies or as and have have the to to in AI. The role of AI in intelligence, systems, autonomous even and other of increasingly In &
- Research Article
- 10.1182/blood-2025-6214
- Nov 3, 2025
- Blood
Evaluating artificial intelligence (AI) as a clinical decision support tool for AML patients
- Research Article
5
- 10.1016/j.joms.2024.11.007
- Mar 1, 2025
- Journal of Oral and Maxillofacial Surgery
Evaluating Artificial Intelligence Chatbots in Oral and Maxillofacial Surgery Board Exams: Performance and Potential
- Research Article
124
- 10.1097/corr.0000000000002704
- May 23, 2023
- Clinical orthopaedics and related research
Advances in neural networks, deep learning, and artificial intelligence (AI) have progressed recently. Previous deep learning AI has been structured around domain-specific areas that are trained on dataset-specific areas of interest that yield high accuracy and precision. A new AI model using large language models (LLM) and nonspecific domain areas, ChatGPT (OpenAI), has gained attention. Although AI has demonstrated proficiency in managing vast amounts of data, implementation of that knowledge remains a challenge. (1) What percentage of Orthopaedic In-Training Examination questions can a generative, pretrained transformer chatbot (ChatGPT) answer correctly? (2) How does that percentage compare with results achieved by orthopaedic residents of different levels, and if scoring lower than the 10th percentile relative to 5th-year residents is likely to correspond to a failing American Board of Orthopaedic Surgery score, is this LLM likely to pass the orthopaedic surgery written boards? (3) Does increasing question taxonomy affect the LLM's ability to select the correct answer choices? This study randomly selected 400 of 3840 publicly available questions based on the Orthopaedic In-Training Examination and compared the mean score with that of residents who took the test over a 5-year period. Questions with figures, diagrams, or charts were excluded, including five questions the LLM could not provide an answer for, resulting in 207 questions administered with raw score recorded. The LLM's answer results were compared with the Orthopaedic In-Training Examination ranking of orthopaedic surgery residents. Based on the findings of an earlier study, a pass-fail cutoff was set at the 10th percentile. Questions answered were then categorized based on the Buckwalter taxonomy of recall, which deals with increasingly complex levels of interpretation and application of knowledge; comparison was made of the LLM's performance across taxonomic levels and was analyzed using a chi-square test. ChatGPT selected the correct answer 47% (97 of 207) of the time, and 53% (110 of 207) of the time it answered incorrectly. Based on prior Orthopaedic In-Training Examination testing, the LLM scored in the 40th percentile for postgraduate year (PGY) 1s, the eighth percentile for PGY2s, and the first percentile for PGY3s, PGY4s, and PGY5s; based on the latter finding (and using a predefined cutoff of the 10th percentile of PGY5s as the threshold for a passing score), it seems unlikely that the LLM would pass the written board examination. The LLM's performance decreased as question taxonomy level increased (it answered 54% [54 of 101] of Tax 1 questions correctly, 51% [18 of 35] of Tax 2 questions correctly, and 34% [24 of 71] of Tax 3 questions correctly; p = 0.034). Although this general-domain LLM has a low likelihood of passing the orthopaedic surgery board examination, testing performance and knowledge are comparable to that of a first-year orthopaedic surgery resident. The LLM's ability to provide accurate answers declines with increasing question taxonomy and complexity, indicating a deficiency in implementing knowledge. Current AI appears to perform better at knowledge and interpretation-based inquires, and based on this study and other areas of opportunity, it may become an additional tool for orthopaedic learning and education.
- Discussion
6
- 10.1016/j.ebiom.2023.104672
- Jul 1, 2023
- eBioMedicine
Response to M. Trengove & coll regarding "Attention is not all you need: the complicated case of ethically using large language models in healthcare and medicine".
- Research Article
4
- 10.3205/zma001702
- Jan 1, 2024
- GMS journal for medical education
The high performance of generative artificial intelligence (AI) and large language models (LLM) in examination contexts has triggered an intense debate about their applications, effects and risks. What legal aspects need to be considered when using LLM in teaching and assessment? What possibilities do language models offer? Statutes and laws are used to assess the use of LLM: - University statutes, state higher education laws, licensing regulations for doctors - Copyright Act (UrhG) - General Data Protection Regulation (DGPR) - AI Regulation (EU AI Act) LLM and AI offer opportunities but require clear university frameworks. These should define legitimate uses and areas where use is prohibited. Cheating and plagiarism violate good scientific practice and copyright laws. Cheating is difficult to detect. Plagiarism by AI is possible. Users of the products are responsible. LLM are effective tools for generating exam questions. Nevertheless, careful review is necessary as even apparently high-quality products may contain errors. However, the risk of copyright infringement with AI-generated exam questions is low, as copyright law allows up to 15% of protected works to be used for teaching and exams. The grading of exam content is subject to higher education laws and regulations and the GDPR. Exclusively computer-based assessment without human review is not permitted. For high-risk applications in education, the EU's AI Regulation will apply in the future. When dealing with LLM in assessments, evaluation criteria for existing assessments can be adapted, as can assessment programmes, e.g. to reduce the motivation to cheat. LLM can also become the subject of the examination themselves. Teachers should undergo further training in AI and consider LLM as an addition.
- Research Article
23
- 10.1016/j.cpa.2024.102722
- Feb 22, 2024
- Critical Perspectives on Accounting
New large language models (LLMs) like ChatGPT have the potential to change qualitative research by contributing to every stage of the research process from generating interview questions to structuring research publications. However, it is far from clear whether such ‘assistance’ will enable or deskill and eventually displace the qualitative researcher. This paper sets out to explore the implications for qualitative research of the recently emerged capabilities of LLMs; how they have acquired their seemingly ‘human-like’ capabilities to ‘converse’ with us humans, and in what ways these capabilities are deceptive or misleading. Building on a comparison of the different ‘trainings’ of humans and LLMs, the paper first traces the seemingly human-like qualities of the LLM to the human proclivity to project communicative intent into or onto LLMs’ purely imitative capacity to predict the structure of human communication. It then goes on to detail the ways in which such human-like communication is deceptive and misleading in relation to the absolute ‘certainty’ with which LLMs ‘converse’, their intrinsic tendencies to ‘hallucination’ and ‘sycophancy’, the narrow conception of ‘artificial intelligence’, LLMs’ complete lack of ethical sensibility or capacity for responsibility, and finally the feared danger of an ‘emergence’ of ‘human-competitive’ or ‘superhuman’ LLM capabilities. The paper concludes by noting the potential dangers of the widespread use of LLMs as ‘mediators’ of human self-understanding and culture. A postscript offers a brief reflection on what only humans can do as qualitative researchers.
- Abstract
- 10.1182/blood-2024-208513
- Nov 5, 2024
- Blood
Evaluating the Accuracy of Artificial Intelligence(AI)-Generated Synopses for Plasma Cell Disorder Treatment Regimens
- Discussion
- 10.1111/cogs.13430
- Mar 1, 2024
- Cognitive science
This letter explores the intricate historical and contemporary links between large language models (LLMs) and cognitive science through the lens of information theory, statistical language models, and socioanthropological linguistic theories. The emergence of LLMs highlights the enduring significance of information-based and statistical learning theories in understanding human communication. These theories, initially proposed in the mid-20th century, offered a visionary framework for integrating computational science, social sciences, and humanities, which nonetheless was not fully fulfilled at that time. The subsequent development of sociolinguistics and linguistic anthropology, especially since the 1970s, provided critical perspectives and empirical methods that both challenged and enriched this framework. This letter proposes that two pivotal concepts derived from this development, metapragmatic function and indexicality, offer a fruitful theoretical perspective for integrating the semantic, textual, and pragmatic, contextual dimensions of communication, an amalgamation that contemporary LLMs have yet to fully achieve. The author believes that contemporary cognitive science is at a crucial crossroads, where fostering interdisciplinary dialogues among computational linguistics, social linguistics and linguistic anthropology, and cognitive and social psychology is in particular imperative. Such collaboration is vital to bridge the computational, cognitive, and sociocultural aspects of human communication and human-AI interaction, especially in the era of large language and multimodal models and human-centric Artificial Intelligence (AI).
- Research Article
34
- 10.1088/1361-6552/ad1fa2
- Feb 6, 2024
- Physics Education
With the rapid evolution of artificial intelligence (AI), its potential implications for higher education have become a focal point of interest. This study delves into the capabilities of AI in physics education and offers actionable AI policy recommendations. Using openAI’s flagship gpt-3.5-turbo large language model (LLM), we assessed its ability to answer 1337 physics exam questions spanning general certificate of secondary education (GCSE), A-Level, and introductory university curricula. We employed various AI prompting techniques: Zero Shot, in context learning, and confirmatory checking, which merges chain of thought reasoning with reflection. The proficiency of gpt-3.5-turbo varied across academic levels: it scored an average of 83.4% on GCSE, 63.8% on A-Level, and 37.4% on university-level questions, with an overall average of 59.9% using the most effective prompting technique. In a separate test, the LLM’s accuracy on 5000 mathematical operations was found to be 45.2%. When evaluated as a marking tool, the LLM’s concordance with human markers averaged at 50.8%, with notable inaccuracies in marking straightforward questions, like multiple-choice. Given these results, our recommendations underscore caution: while current LLMs can consistently perform well on physics questions at earlier educational stages, their efficacy diminishes with advanced content and complex calculations. LLM outputs often showcase novel methods not in the syllabus, excessive verbosity, and miscalculations in basic arithmetic. This suggests that at university, there’s no substantial threat from LLMs for non-invigilated physics questions. However, given the LLMs’ considerable proficiency in writing physics essays and coding abilities, non-invigilated examinations of these skills in physics are highly vulnerable to automated completion by LLMs. This vulnerability also extends to pysics questions pitched at lower academic levels. It is thus recommended that educators be transparent about LLM capabilities with their students, while emphasizing caution against overreliance on their output due to its tendency to sound plausible but be incorrect.
- Research Article
- 10.1177/2473011425s00142
- Oct 1, 2025
- Foot & Ankle Orthopaedics
Research Type: Level 3 - Retrospective cohort study, Case-control study, Meta-analysis of Level 3 studies Introduction/Purpose: Identifying and tracking surgical complications is a critical component of maintaining quality registries and improving clinical care. Medical record review to record complications is often performed by a dedicated clinical team and is time-consuming and expensive. In recent years, Large Language Models (LLMs) have emerged as a promising Artificial Intelligence (AI) tool to more efficiently and accurately retrieve clinical information from patient records. However, early literature has shown that without careful prompt design, LLMs are vulnerable to error. The primary purpose of this study is to determine if an LLM platform, compared to traditional clinical chart reviewers, can be used reliably and automatically screen for complications directly from medical notes for patients who underwent total ankle arthroplasty (TAA). Methods: Following IRB approval, patient records were retrospectively identified from an institutional TAA registry with surgeries performed from 2015 to 2024. Patients were manually evaluated by the research team for intraoperative fracture (IOF), deep vein thrombosis (DVT), superficial wound infection (SWI), and deep wound infection (DWI). Patient records were then scrubbed of HIPAA-identifiers, age, and gender, and input into an LLM for analysis. An automated script was developed that assessed each patient visit note for complications and recorded the result in a table format with “yes” or “no” if a complication occurred. Disagreements between reviewer and LLM complications were secondarily reviewed by a blinded investigator to produce a final, gold standard data set. The sensitivity and specificity of both reviewer and LLM chart review are compared. Statistical difference between the groups is determined using a McNemar test and similarity of decisions was evaluated using an Intraclass Correlation Coefficient (ICC). Results: A total of 1952 notes were reviewed for 310 TAA procedures. The final rate of IOF, DVT, SWI, and DWI was found to be 5.2%, 0.0%, 4.2%, and 0.6%, respectively. Chart reviewers had high agreement with the LLM in evaluated DVT (100% match, ICC 1.00), and DWI (99.7% match, ICC 0.88), but significant disagreement in rate of IOF (97.1% match, ICC 0.77, p = 0.008) and SWI (91.9% match, ICC 0.49, p = 0.05). After secondary review by a blinded author, the LLM was found to have a higher sensitivity compared to reviewers for SWI (0.85 vs. 0.69, respectively) and IOF (1.00 vs. 0.50, respectively). However, reviewers had a higher specificity than the LLM for SWI (0.98 vs. 0.95, respectively). Conclusion: Our results support that current LLMs can be applied to screen free text medical records for complications at a sensitivity and specificity comparable to clinical chart reviewers. The LLM is more prone to both true and false positives, indicated by both a higher sensitivity and lower specificity. Importantly, this assessment of complications using the LLM was completely automated and did not require human intervention to run, allowing substantially higher efficiency. LLMs show promise to dramatically scale the size and reliability of clinical outcome and quality registries in coming years. Further refinement of the LLM script may improve accuracy. Comparing identified complications following total ankle arthroplasty between manual chart review and a large language model Rate of intraoperative fracture (IOP), deep vein thrombosis (DVT), superficial wound infection (SWI) and deep wound infection (DWI) identified by manual chart reviewers and a large language model (LLM). Sensitivity and specificity for each are provided relative to a verified gold-standard, a fellowship trained blind investigator.
- Abstract
3
- 10.1182/blood-2023-185854
- Nov 2, 2023
- Blood
Evaluating the Performance of Large Language Models in Hematopoietic Stem Cell Transplantation Decision Making
- Research Article
60
- 10.2196/56764
- Apr 25, 2024
- Journal of Medical Internet Research
As the health care industry increasingly embraces large language models (LLMs), understanding the consequence of this integration becomes crucial for maximizing benefits while mitigating potential pitfalls. This paper explores the evolving relationship among clinician trust in LLMs, the transition of data sources from predominantly human-generated to artificial intelligence (AI)–generated content, and the subsequent impact on the performance of LLMs and clinician competence. One of the primary concerns identified in this paper is the LLMs’ self-referential learning loops, where AI-generated content feeds into the learning algorithms, threatening the diversity of the data pool, potentially entrenching biases, and reducing the efficacy of LLMs. While theoretical at this stage, this feedback loop poses a significant challenge as the integration of LLMs in health care deepens, emphasizing the need for proactive dialogue and strategic measures to ensure the safe and effective use of LLM technology. Another key takeaway from our investigation is the role of user expertise and the necessity for a discerning approach to trusting and validating LLM outputs. The paper highlights how expert users, particularly clinicians, can leverage LLMs to enhance productivity by off-loading routine tasks while maintaining a critical oversight to identify and correct potential inaccuracies in AI-generated content. This balance of trust and skepticism is vital for ensuring that LLMs augment rather than undermine the quality of patient care. We also discuss the risks associated with the deskilling of health care professionals. Frequent reliance on LLMs for critical tasks could result in a decline in health care providers’ diagnostic and thinking skills, particularly affecting the training and development of future professionals. The legal and ethical considerations surrounding the deployment of LLMs in health care are also examined. We discuss the medicolegal challenges, including liability in cases of erroneous diagnoses or treatment advice generated by LLMs. The paper references recent legislative efforts, such as The Algorithmic Accountability Act of 2023, as crucial steps toward establishing a framework for the ethical and responsible use of AI-based technologies in health care. In conclusion, this paper advocates for a strategic approach to integrating LLMs into health care. By emphasizing the importance of maintaining clinician expertise, fostering critical engagement with LLM outputs, and navigating the legal and ethical landscape, we can ensure that LLMs serve as valuable tools in enhancing patient care and supporting health care professionals. This approach addresses the immediate challenges posed by integrating LLMs and sets a foundation for their maintainable and responsible use in the future.
- Research Article
13
- 10.1016/j.jclinepi.2025.111746
- May 1, 2025
- Journal of clinical epidemiology
Machine learning promises versatile help in the creation of systematic reviews (SRs). Recently, further developments in the form of large language models (LLMs) and their application in SR conduct attracted attention. We aimed at providing an overview of LLM applications in SR conduct in health research. We systematically searched MEDLINE, Web of Science, IEEEXplore, ACM Digital Library, Europe PMC (preprints), Google Scholar, and conducted an additional hand search (last search: February 26, 2024). We included scientific articles in English or German, published from April 2021 onwards, building upon the results of a mapping review that has not yet identified LLM applications to support SRs. Two reviewers independently screened studies for eligibility; after piloting, 1 reviewer extracted data, checked by another. Our database search yielded 8054 hits, and we identified 33 articles from our hand search. We finally included 37 articles on LLM support. LLM approaches covered 10 of 13 defined SR steps, most frequently literature search (n = 15, 41%), study selection (n = 14, 38%), and data extraction (n = 11, 30%). The mostly recurring LLM was Generative Pretrained Transformer (GPT) (n = 33, 89%). Validation studies were predominant (n = 21, 57%). In half of the studies, authors evaluated LLM use as promising (n = 20, 54%), one-quarter as neutral (n = 9, 24%) and one-fifth as nonpromising (n = 8, 22%). Although LLMs show promise in supporting SR creation, fully established or validated applications are often lacking. The rapid increase in research on LLMs for evidence synthesis production highlights their growing relevance. Systematic reviews are a crucial tool in health research where experts carefully collect and analyze all available evidence on a specific research question. Creating these reviews is typically time- and resource-intensive, often taking months or even years to complete, as researchers must thoroughly search, evaluate, and synthesize an immense number of scientific studies. For the present article, we conducted a review to understand how new artificial intelligence (AI) tools, specifically large language models (LLMs) like Generative Pretrained Transformer (GPT), can be used to help create systematic reviews in health research. We searched multiple scientific databases and finally found 37 relevant articles. We found that LLMs have been tested to help with various parts of the systematic review process, particularly in 3 main areas: searching scientific literature (41% of studies), selecting relevant studies (38%), and extracting important information from these studies (30%). GPT was the most commonly used LLM, appearing in 89% of the studies. Most of the research (57%) focused on testing whether these AI tools actually work as intended in this context of systematic review production. The results were mixed: about half of the studies found LLMs promising, a quarter were neutral, and one-fifth found them not promising. While LLMs show potential for making the systematic review process more efficient, there is still a lack of fully tested and validated applications. However, the increasing number of studies in this field suggests that these AI tools are becoming increasingly important in creating systematic reviews.
- Ask R Discovery
- Chat PDF
AI summaries and top papers from 250M+ research sources.