Co-Adaptation in an Ecosystem of Human–Machine Dyadic Interaction
In August 2025, The Washington Post published a headline that captured widespread public attention: “It’s happening: People are starting to talk like ChatGPT” (Aleksic, 2025). Reporting on findings by Juzek and Ward (2025), the article described how lexical items overrepresented in ChatGPT’s output such as delve were appearing with increasing frequency in human speech. Striking as this discovery may be, it is not entirely surprising. Artificial intelligence (AI) systems have rapidly permeated everyday life, and large language models (LLMs) such as ChatGPT are now used by millions for writing, learning, planning, and problem-solving. Many users engage with these systems not once, but repeatedly over time.
- Research Article
11
- 10.1287/ijds.2023.0007
- Apr 1, 2023
- INFORMS Journal on Data Science
How Can <i>IJDS</i> Authors, Reviewers, and Editors Use (and Misuse) Generative AI?
- Research Article
27
- 10.1162/daed_e_01897
- May 1, 2022
- Daedalus
This dialogue is from an early scene in the 2014 film Ex Machina, in which Nathan has invited Caleb to determine whether Nathan has succeeded in creating artificial intelligence.1 The achievement of powerful artificial general intelligence has long held a grip on our imagination not only for its exciting as well as worrisome possibilities, but also for its suggestion of a new, uncharted era for humanity. In opening his 2021 BBC Reith Lectures, titled "Living with Artificial Intelligence," Stuart Russell states that "the eventual emergence of general-purpose artificial intelligence [will be] the biggest event in human history."2Over the last decade, a rapid succession of impressive results has brought wider public attention to the possibilities of powerful artificial intelligence. In machine vision, researchers demonstrated systems that could recognize objects as well as, if not better than, humans in some situations. Then came the games. Complex games of strategy have long been associated with superior intelligence, and so when AI systems beat the best human players at chess, Atari games, Go, shogi, StarCraft, and Dota, the world took notice. It was not just that Als beat humans (although that was astounding when it first happened), but the escalating progression of how they did it: initially by learning from expert human play, then from self-play, then by teaching themselves the principles of the games from the ground up, eventually yielding single systems that could learn, play, and win at several structurally different games, hinting at the possibility of generally intelligent systems.3Speech recognition and natural language processing have also seen rapid and headline-grabbing advances. Most impressive has been the emergence recently of large language models capable of generating human-like outputs. Progress in language is of particular significance given the role language has always played in human notions of intelligence, reasoning, and understanding. While the advances mentioned thus far may seem abstract, those in driverless cars and robots have been more tangible given their embodied and often biomorphic forms. Demonstrations of such embodied systems exhibiting increasingly complex and autonomous behaviors in our physical world have captured public attention.Also in the headlines have been results in various branches of science in which AI and its related techniques have been used as tools to advance research from materials and environmental sciences to high energy physics and astronomy.4 A few highlights, such as the spectacular results on the fifty-year-old protein-folding problem by AlphaFold, suggest the possibility that AI could soon help tackle science's hardest problems, such as in health and the life sciences.5While the headlines tend to feature results and demonstrations of a future to come, AI and its associated technologies are already here and pervade our daily lives more than many realize. Examples include recommendation systems, search, language translators - now covering more than one hundred languages - facial recognition, speech to text (and back), digital assistants, chatbots for customer service, fraud detection, decision support systems, energy management systems, and tools for scientific research, to name a few. In all these examples and others, AI-related techniques have become components of other software and hardware systems as methods for learning from and incorporating messy real-world inputs into inferences, predictions, and, in some cases, actions. As director of the Future of Humanity Institute at the University of Oxford, Nick Bostrom noted back in 2006, "A lot of cutting-edge AI has filtered into general applications, often without being called AI because once something becomes useful enough and common enough it's not labeled AI anymore."6As the scope, use, and usefulness of these systems have grown for individual users, researchers in various fields, companies and other types of organizations, and governments, so too have concerns when the systems have not worked well (such as bias in facial recognition systems), or have been misused (as in deepfakes), or have resulted in harms to some (in predicting crime, for example), or have been associated with accidents (such as fatalities from self-driving cars).7Dædalus last devoted a volume to the topic of artificial intelligence in 1988, with contributions from several of the founders of the field, among others. Much of that issue was concerned with questions of whether research in AI was making progress, of whether AI was at a turning point, and of its foundations, mathematical, technical, and philosophical-with much disagreement. However, in that volume there was also a recognition, or perhaps a rediscovery, of an alternative path toward AI - the connectionist learning approach and the notion of neural nets-and a burgeoning optimism for this approach's potential. Since the 1960s, the learning approach had been relegated to the fringes in favor of the symbolic formalism for representing the world, our knowledge of it, and how machines can reason about it. Yet no essay captured some of the mood at the time better than Hilary Putnam's "Much Ado About Not Very Much." Putnam questioned the Dædalus issue itself: "Why a whole issue of Dædalus? Why don't we wait until AI achieves something and then have an issue?" He concluded:This volume of Dædalus is indeed the first since 1988 to be devoted to artificial intelligence. This volume does not rehash the same debates; much else has happened since, mostly as a result of the success of the machine learning approach that was being rediscovered and reimagined, as discussed in the 1988 volume. This issue aims to capture where we are in AI's development and how its growing uses impact society. The themes and concerns herein are colored by my own involvement with AI. Besides the television, films, and books that I grew up with, my interest in AI began in earnest in 1989 when, as an undergraduate at the University of Zimbabwe, I undertook a research project to model and train a neural network.9 I went on to do research on AI and robotics at Oxford. Over the years, I have been involved with researchers in academia and labs developing AI systems, studying AI's impact on the economy, tracking AI's progress, and working with others in business, policy, and labor grappling with its opportunities and challenges for society.10The authors of the twenty-five essays in this volume range from AI scientists and technologists at the frontier of many of AI's developments to social scientists at the forefront of analyzing AI's impacts on society. The volume is organized into ten sections. Half of the sections are focused on AI's development, the other half on its intersections with various aspects of society. In addition to the diversity in their topics, expertise, and vantage points, the authors bring a range of views on the possibilities, benefits, and concerns for society. I am grateful to the authors for accepting my invitation to write these essays.Before proceeding further, it may be useful to say what we mean by artificial intelligence. The headlines and increasing pervasiveness of AI and its associated technologies have led to some conflation and confusion about what exactly counts as AI. This has not been helped by the current trend-among researchers in science and the humanities, startups, established companies, and even governments-to associate anything involving not only machine learning, but data science, algorithms, robots, and automation of all sorts with AI. This could simply reflect the hype now associated with AI, but it could also be an acknowledgment of the success of the current wave of AI and its related techniques and their wide-ranging use and usefulness. I think both are true; but it has not always been like this. In the period now referred to as the AI winter, during which progress in AI did not live up to expectations, there was a reticence to associate most of what we now call AI with AI.Two types of definitions are typically given for AI. The first are those that suggest that it is the ability to artificially do what intelligent beings, usually human, can do. For example, artificial intelligence is:The human abilities invoked in such definitions include visual perception, speech recognition, the capacity to reason, solve problems, discover meaning, generalize, and learn from experience. Definitions of this type are considered by some to be limiting in their human-centricity as to what counts as intelligence and in the benchmarks for success they set for the development of AI (more on this later). The second type of definitions try to be free of human-centricity and define an intelligent agent or system, whatever its origin, makeup, or method, as:This type of definition also suggests the pursuit of goals, which could be given to the system, self-generated, or learned.13 That both types of definitions are employed throughout this volume yields insights of its own.These definitional distinctions notwithstanding, the term AI, much to the chagrin of some in the field, has come to be what cognitive and computer scientist Marvin Minsky called a "suitcase word."14 It is packed variously, depending on who you ask, with approaches for achieving intelligence, including those based on logic, probability, information and control theory, neural networks, and various other learning, inference, and planning methods, as well as their instantiations in software, hardware, and, in the case of embodied intelligence, systems that can perceive, move, and manipulate objects.Three questions cut through the discussions in this volume: 1) Where are we in AI's development? 2) What opportunities and challenges does AI pose for society? 3) How much about AI is really about us?Notions of intelligent machines date all the way back to antiquity.15 Philosophers, too, among them Hobbes, Leibnitz, and Descartes, have been dreaming about AI for a long time; Daniel Dennett suggests that Descartes may have even anticipated the Turing Test.16 The idea of computation-based machine intelligence traces to Alan Turing's invention of the universal Turing machine in the 1930s, and to the ideas of several of his contemporaries in the mid-twentieth century. But the birth of artificial intelligence as we know it and the use of the term is generally attributed to the now famed Dartmouth summer workshop of 1956. The workshop was the result of a proposal for a two-month summer project by John McCarthy, Marvin Minsky, Nathaniel Rochester, and Claude Shannon whereby "An attempt will be made to find how to make machines use language, form abstractions and concepts, solve kinds of problems now reserved for humans, and improve themselves."17In their respective contributions to this volume, "From So Simple a Beginning: Species of Artificial Intelligence" and "If We Succeed," and in different but complementary ways, Nigel Shadbolt and Stuart Russell chart the key ideas and developments in AI, its periods of excitement as well as the aforementioned AI winters. The current AI spring has been underway since the 1990s, with headline-grabbing breakthroughs appearing in rapid succession over the last ten years or so: a period that Jeffrey Dean describes in the title of his essay as a "golden decade," not only for the pace of AI development but also its use in a wide range of sectors of society, as well as areas of scientific research.18 This period is best characterized by the approach to achieve artificial intelligence through learning from experience, and by the success of neural networks, deep learning, and reinforcement learning, together with methods from probability theory, as ways for machines to learn.19A brief history may be useful here: In the 1950s, there were two dominant visions of how to achieve machine intelligence. One vision was to use computers to create a logic and symbolic representation of the world and our knowledge of it and, from there, create systems that could reason about the world, thus exhibiting intelligence akin to the mind. This vision was most espoused by Allen Newell and Hebert Simon, along with Marvin Minsky and others. Closely associated with it was the "heuristic search" approach that supposed intelligence was essentially a problem of exploring a space of possibilities for answers. The second vision was inspired by the brain, rather than the mind, and sought to achieve intelligence by learning. In what became known as the connectionist approach, units called perceptrons were connected in ways inspired by the connection of neurons in the brain. At the time, this approach was most associated with Frank Rosenblatt. While there was initial excitement about both visions, the first came to dominate, and did so for decades, with some successes, including so-called expert systems.Not only did this approach benefit from championing by its advocates and plentiful funding, it came with the suggested weight of a long intellectual tradition-exemplified by Descartes, Boole, Frege, Russell, and Church, among others-that sought to manipulate symbols and to formalize and axiomatize knowledge and reasoning. It was only in the late 1980s that interest began to grow again in the second vision, largely through the work of David Rumelhart, Geoffrey Hinton, James McClelland, and others. The history of these two visions and the associated philosophical ideas are discussed in Hubert Dreyfus and Stuart Dreyfus's 1988 Dædalus essay "Making a Mind Versus Modeling the Brain: Artificial Intelligence Back at a Branchpoint."20 Since then, the approach to intelligence based on learning, the use of statistical methods, back-propagation, and training (supervised and unsupervised) has come to characterize the current dominant approach.Kevin Scott, in his essay "I Do Not Think It Means What You Think It Means: Artificial Intelligence, Cognitive Work & Scale," reminds us of the work of Ray Solomonoff and others linking information and probability theory with the idea of machines that can not only learn, but compress and potentially generalize what they learn, and the emerging realization of this in the systems now being built and those to come. The success of the machine learning approach has benefited from the boon in the availability of data to train the algorithms thanks to the growth in the use of the Internet and other applications and services. In research, the data explosion has been the result of new scientific instruments and observation platforms and data-generating breakthroughs, for example, in astronomy and in genomics. Equally important has been the co-evolution of the software and hardware used, especially chip architectures better suited to the parallel computations involved in data- and compute-intensive neural networks and other machine learning approaches, as Dean discusses.Several authors delve into progress in key subfields of AI.21 In their essay, "Searching for Computer Vision North Stars," Fei-Fei Li and Ranjay Krishna chart developments in machine vision and the creation of standard data sets such as ImageNet that could be used for benchmarking performance. In their respective essays "Human Language Understanding & Reasoning" and "The Curious Case of Commonsense Intelligence," Chris Manning and Yejin Choi discuss different eras and ideas in natural language processing, including the recent emergence of large language models comprising hundreds of billions of parameters and that use transformer architectures and self-supervised learning on vast amounts of data.22 The resulting pretrained models are impressive in their capacity to take natural language prompts for which they have not been trained specifically and generate human-like outputs, not only in natural language, but also images, software code, and more, as Mira Murati discusses and illustrates in "Language & Coding Creativity." Some have started to refer to these large language models as foundational models in that once they are trained, they are adaptable to a wide range of tasks and outputs.23 But despite their unexpected performance, these large language models are still early in their development and have many shortcomings and limitations that are highlighted in this volume and elsewhere, including by some of their developers.24In "The Machines from Our Future," Daniela Rus discusses the progress in robotic systems, including advances in the underlying technologies, as well as in their integrated design that enables them to operate in the physical world. She highlights the limitations in the "industrial" approaches used thus far and suggests new ways of conceptualizing robots that draw on insights from biological systems. In robotics, as in AI more generally, there has always been a tension as to whether to copy or simply draw inspiration from how humans and other biological organisms achieve intelligent behavior. Elsewhere, AI researcher Demis Hassabis and colleagues have explored how neuroscience and AI learn from and inspire each other, although so far more in one than the other, as and have the success of the current approaches to AI, there are still many shortcomings and as well as problems in It is useful to on one such as when AI does not as or or or that can to or when it on or information about the world, or when it has such as of all of which can to a of public shortcomings have captured the attention of the wider public and as well as among there is an on AI and In recent years, there has been a of to principles and approaches to AI, as well as involving and such as the on AI, that to best important has been the of with to and - in the and developing AI in both and as has been well in recent This is an important in its own but also with to the of the resulting AI and, in its intersections with more the other there are limitations and problems associated with the that AI is not capable of if could to more more or more general AI. In their Turing deep learning and Geoffrey took of where deep learning and highlighted its current such as the with In the case of natural language processing, Manning and Choi the challenges in and despite the of large language Elsewhere, and have the notion that large language models do anything learning, or In & of in a and discuss the problems in systems, the as how to reason about other their systems, and well as challenges in both and especially when the include both humans and Elsewhere, and others a useful of the problems in there is a growing among many that we do not have for the of AI systems, especially as they become more capable and the of use although AI and its related techniques are to be powerful tools for research in science, as examples in this volume and recent examples in which AI not only help results but also by design and become what some have AI to science and and to and challenges for the possibility that more powerful AI could to new in science, as well as progress in some of challenges and has long been a key for many at the frontier of AI research to more capable the of each of AI, the of more general problems that to the possibility of more capable AI learning, reasoning, of and and of these and other problems that could to more capable systems the of whether current characterized by deep learning, the of and and more foundational and and reinforcement or whether different approaches are in such as cognitive agent approaches or or based on logic and probability theory, to name a few. whether and what of approaches be the AI is but many the current along with of and learning architectures have to their about the of the current approaches is associated with the of whether artificial general intelligence can be and if how and Artificial general intelligence is in to what is called that AI and for tasks and goals, such as The development of on the other aims for more powerful AI - at as powerful as is generally to problem or and, in some the capacity to and improve as well as set and its own and the of and when will be is a for most that its achievement have and as is often in and such as A through and The to Ex and it is or there is growing among many at the frontier of AI research that we for the possibility of powerful with to and and with humans, its and use, and the possibility that of could and that we these into how we approach the development of of the research and development, and in AI is of the AI and in its what Nigel Shadbolt the of AI. This is given the for useful and applications and the for in sectors of the However, a few have made the development of their the most of these are and each of which has demonstrated results of increasing still a long way from the most discussed impact of AI and automation is on and the future of This is not In in the of the excitement about AI and and concerns about their impact on a on and the was that such technologies were important for growth and and "the that but not Most recent of this including those I have been involved have and that over time, more are than are that it is the and the and the of will the In their essay AI & and John discuss these for work and further, in & the of & to discuss the with to and and as well as the opportunities that are especially in developing In "The Turing The & of Artificial Intelligence," discusses how the use of human benchmarks in the development of AI the of AI that rather than human He that the AI's development will take in this and resulting for will on the for companies, and a that the that more will be than too much from of the and does not far enough into the future and at what AI will be capable The for AI could from of that in the is and labor and ability to are and and until automation has mostly physical and but that AI will be on more cognitive and tasks based on and, if early examples are even tasks are not of the In other are now in the world machines that that learn and that their ability to do these is to a range of problems they can will be with the range to which the human has been This was and Allen Newell in that this time could be different usually two that new labor will in which will by other humans for their own even when machines may be capable of these as well as or even better than The other is that AI will create so much and all without the for human and the of will be to for when that will the that once the first time since his creation will be with his his to use his from how to the which science and interest will have for to live and and However, most researchers that we are not to a future in which the of will and that until then, there are other and that be in the labor now and in the such as and other and how humans work increasingly capable that and John and discuss in this are not the only of the by AI. Russell a of the potentially from artificial general intelligence, once a of or ten But even we to general-purpose AI, the opportunities for companies and, for the and growth as well as from AI and its related technologies are more than to pursuit and by companies and in the development, and use of AI. At the many the is it is generally that is a in AI, as by its growth in AI research, and as highlighted in several will have for companies and given the of such technologies as discussed by and others the may in the way of approaches to AI and (such as whether they are companies or as and have have the to to in AI. The role of AI in intelligence, systems, autonomous even and other of increasingly In &
- Research Article
- 10.54097/0pbd1t53
- Jan 29, 2026
- Academic Journal of Science and Technology
Large language models (LLMs) have made remarkable strides in natural language tasks, but mathematical problem solving remains a significant challenge. This paper surveys recent methods developed to enhance the mathematical reasoning abilities of LLMs. Techniques such as chain-of-thought prompting, self-consistency decoding, retrieval-based augmentation, and tool use have markedly improved performance on math benchmarks. Despite progress, the LLMs still struggle with reasoning faithfulness, generalization to new problems, and arithmetic accuracy. This paper identifies three key challenges and discuss corresponding future directions, including structured reasoning formats, automatic verification, and targeted training. Bridging the gap between current LLM performance and human-level mathematical proficiency will not only advance AI’s capabilities in math but also improve the transparency and reliability of reasoning in AI systems more broadly. Ultimately, strengthening LLMs in mathematics represents a step toward Artificial Intelligence (AI) systems that can provide trustworthy, interpretable, and high-value support in education, scientific research, and other domains requiring precise reasoning.
- Discussion
- 10.1016/j.lanwpc.2024.101146
- Jul 1, 2024
- The Lancet Regional Health - Western Pacific
Unveiling the black box: imperative for explainable AI in cardiovascular disease (CVD) prevention–author reply
- Conference Article
8
- 10.1109/idap64064.2024.10710677
- Sep 21, 2024
As technology develops day by day, significant developments have been made in the field of artificial intelligence (AI). In particular, machine learning (ML) and deep learning (DL), as the main technologies that form the basis of artificial intelligence, have offered revolutionary innovations and laid the foundation for future technologies. Traditional artificial intelligence models are based on algorithms that show high performance in certain tasks such as classification, scoring, prediction, and pattern recognition. These algorithms are developed to best perform a specific task, making it difficult for artificial intelligence to be sufficiently effective in areas that require flexibility. Generative artificial intelligence, which has become widespread in recent years, has the ability to produce certain types of content in addition to the competencies of traditional artificial intelligence models. This has revolutionized the field of productivity in artificial intelligence. Generative artificial intelligence language models have gone beyond the limitations and started a new era in artificial intelligence applications. Where traditional artificial intelligence models are limited, language models have come into play, especially with their natural language processing (NLP) capabilities. Rather than just analyzing data, language models can learn the rules of the language and provide human-like responses, produce text, and offer a wider range of applications. In this way, artificial intelligence systems have become more flexible, extensible, and dynamic. With the rise of language models in this field, concepts such as large language models (LLM) and small language models (SLM) have emerged. Large language models have come to the fore as systems that can provide deep knowledge and language production on a wide variety of topics by being trained on huge data sets. Large language models such as ChatGPT are one of the most common and impressive examples in this field. However, small language models, which are smaller and specialized language models, have begun to be used as an alternative to large language models in certain areas because they require less data and processing power. Small language models stand out with their lighter but targeted performance, offering effective solutions, especially in situations where there are resource limitations. At this point, using both large and small versions of language models in the right scenarios provides great advantages in terms of sustainability and efficiency. This study aims to reveal the transformative effect of technology on artificial intelligence and the critical role of language models in this process by evaluating language models and the issues to be considered in the selection of these models.
- Research Article
56
- 10.5204/mcj.3004
- Oct 2, 2023
- M/C Journal
Introduction Author Arthur C. Clarke famously argued that in science fiction literature “any sufficiently advanced technology is indistinguishable from magic” (Clarke). On 30 November 2022, technology company OpenAI publicly released their Large Language Model (LLM)-based chatbot ChatGPT (Chat Generative Pre-Trained Transformer), and instantly it was hailed as world-changing. Initial media stories about ChatGPT highlighted the speed with which it generated new material as evidence that this tool might be both genuinely creative and actually intelligent, in both exciting and disturbing ways. Indeed, ChatGPT is part of a larger pool of Generative Artificial Intelligence (AI) tools that can very quickly generate seemingly novel outputs in a variety of media formats based on text prompts written by users. Yet, claims that AI has become sentient, or has even reached a recognisable level of general intelligence, remain in the realm of science fiction, for now at least (Leaver). That has not stopped technology companies, scientists, and others from suggesting that super-smart AI is just around the corner. Exemplifying this, the same people creating generative AI are also vocal signatories of public letters that ostensibly call for a temporary halt in AI development, but these letters are simultaneously feeding the myth that these tools are so powerful that they are the early form of imminent super-intelligent machines. For many people, the combination of AI technologies and media hype means generative AIs are basically magical insomuch as their workings seem impenetrable, and their existence could ostensibly change the world. This article explores how the hype around ChatGPT and generative AI was deployed across the first six months of 2023, and how these technologies were positioned as either utopian or dystopian, always seemingly magical, but never banal. We look at some initial responses to generative AI, ranging from schools in Australia to picket lines in Hollywood. We offer a critique of the utopian/dystopian binary positioning of generative AI, aligning with critics who rightly argue that focussing on these extremes displaces the more grounded and immediate challenges generative AI bring that need urgent answers. Finally, we loop back to the role of schools and educators in repositioning generative AI as something to be tested, examined, scrutinised, and played with both to ground understandings of generative AI, while also preparing today’s students for a future where these tools will be part of their work and cultural landscapes. Hype, Schools, and Hollywood In December 2022, one month after OpenAI launched ChatGPT, Elon Musk tweeted: “ChatGPT is scary good. We are not far from dangerously strong AI”. Musk’s post was retweeted 9400 times, liked 73 thousand times, and presumably seen by most of his 150 million Twitter followers. This type of engagement typified the early hype and language that surrounded the launch of ChatGPT, with reports that “crypto” had been replaced by generative AI as the “hot tech topic” and hopes that it would be “‘transformative’ for business” (Browne). By March 2023, global economic analysts at Goldman Sachs had released a report on the potentially transformative effects of generative AI, saying that it marked the “brink of a rapid acceleration in task automation that will drive labor cost savings and raise productivity” (Hatzius et al.). Further, they concluded that “its ability to generate content that is indistinguishable from human-created output and to break down communication barriers between humans and machines reflects a major advancement with potentially large macroeconomic effects” (Hatzius et al.). Speculation about the potentially transformative power and reach of generative AI technology was reinforced by warnings that it could also lead to “significant disruption” of the labour market, and the potential automation of up to 300 million jobs, with associated job losses for humans (Hatzius et al.). In addition, there was widespread buzz that ChatGPT’s “rationalization process may evidence human-like cognition” (Browne), claims that were supported by the emergent language of ChatGPT. The technology was explained as being “trained” on a “corpus” of datasets, using a “neural network” capable of producing “natural language“” (Dsouza), positioning the technology as human-like, and more than ‘artificial’ intelligence. Incorrect responses or errors produced by the tech were termed “hallucinations”, akin to magical thinking, which OpenAI founder Sam Altman insisted wasn’t a word that he associated with sentience (Intelligencer staff). Indeed, Altman asserts that he rejects moves to “anthropomorphize” (Intelligencer staff) the technology; however, arguably the language, hype, and Altman’s well-publicised misgivings about ChatGPT have had the combined effect of shaping our understanding of this generative AI as alive, vast, fast-moving, and potentially lethal to humanity. Unsurprisingly, the hype around the transformative effects of ChatGPT and its ability to generate ‘human-like’ answers and sophisticated essay-style responses was matched by a concomitant panic throughout educational institutions. The beginning of the 2023 Australian school year was marked by schools and state education ministers meeting to discuss the emerging problem of ChatGPT in the education system (Hiatt). Every state in Australia, bar South Australia, banned the use of the technology in public schools, with a “national expert task force” formed to “guide” schools on how to navigate ChatGPT in the classroom (Hiatt). Globally, schools banned the technology amid fears that students could use it to generate convincing essay responses whose plagiarism would be undetectable with current software (Clarence-Smith). Some schools banned the technology citing concerns that it would have a “negative impact on student learning”, while others cited its “lack of reliable safeguards preventing these tools exposing students to potentially explicit and harmful content” (Cassidy). ChatGPT investor Musk famously tweeted, “It’s a new world. Goodbye homework!”, further fuelling the growing alarm about the freely available technology that could “churn out convincing essays which can't be detected by their existing anti-plagiarism software” (Clarence-Smith). Universities were reported to be moving towards more “in-person supervision and increased paper assessments” (SBS), rather than essay-style assessments, in a bid to out-manoeuvre ChatGPT’s plagiarism potential. Seven months on, concerns about the technology seem to have been dialled back, with educators more curious about the ways the technology can be integrated into the classroom to good effect (Liu et al.); however, the full implications and impacts of the generative AI are still emerging. In May 2023, the Writer’s Guild of America (WGA), the union representing screenwriters across the US creative industries, went on strike, and one of their core issues were “regulations on the use of artificial intelligence in writing” (Porter). Early in the negotiations, Chris Keyser, co-chair of the WGA’s negotiating committee, lamented that “no one knows exactly what AI’s going to be, but the fact that the companies won’t talk about it is the best indication we’ve had that we have a reason to fear it” (Grobar). At the same time, the Screen Actors’ Guild (SAG) warned that members were being asked to agree to contracts that stipulated that an actor’s voice could be re-used in future scenarios without that actor’s additional consent, potentially reducing actors to a dataset to be animated by generative AI technologies (Scheiber and Koblin). In a statement issued by SAG, they made their position clear that the creation or (re)animation of any digital likeness of any part of an actor must be recognised as labour and properly paid, also warning that any attempt to legislate around these rights should be strongly resisted (Screen Actors Guild). Unlike the more sensationalised hype, the WGA and SAG responses to generative AI are grounded in labour relations. These unions quite rightly fear the immediate future where human labour could be augmented, reclassified, and exploited by, and in the name of, algorithmic systems. Screenwriters, for example, might be hired at much lower pay rates to edit scripts first generated by ChatGPT, even if those editors would really be doing most of the creative work to turn something clichéd and predictable into something more appealing. Rather than a dystopian world where machines do all the work, the WGA and SAG protests railed against a world where workers would be paid less because executives could pretend generative AI was doing most of the work (Bender). The Open Letter and Promotion of AI Panic In an open letter that received enormous press and media uptake, many of the leading figures in AI called for a pause in AI development since “advanced AI could represent a profound change in the history of life on Earth”; they warned early 2023 had already seen “an out-of-control race to develop and deploy ever more powerful digital minds that no one – not even their creators – can understand, predict, or reliably control” (Future of Life Institute). Further, the open letter signatories called on “all AI labs to immediately pause for at least 6 months the training of AI systems more powerful than GPT-4”, arguing that “labs and independent experts should use this pause to jointly develop and implement a set of shared safety protocols for advanced AI design and development that are rigorously audited and overseen by independent outside experts” (Future of Life Institute). Notably, many of the signatories work for the very companies involved in the “out-of-control race”. Indeed, while this letter could be read as a moment of ethical clarity for the AI industry, a more cynical reading might just be that in warning that their AIs could effectively destroy the w
- Research Article
6
- 10.1111/risa.14353
- Jun 30, 2024
- Risk analysis : an official publication of the Society for Risk Analysis
This article presents a risk analysis of large language models (LLMs), a type of "generative" artificial intelligence (AI) system that produces text, commonly in response to textual inputs from human users. The article is specifically focused on the risk of LLMs causing an extreme catastrophe in which they do something akin to taking over the world and killing everyone. The possibility of LLM takeover catastrophe has been a major point of public discussion since the recent release of remarkably capable LLMs such as ChatGPT and GPT-4. This arguably marks the first time when actual AI systems (and not hypothetical future systems) have sparked concern about takeover catastrophe. The article's analysis compares (A) characteristics of AI systems that may be needed for takeover, as identified in prior theoretical literature on AI takeover risk, with (B) characteristics observed in current LLMs. This comparison reveals that the capabilities of current LLMs appear to fall well short of what may be needed for takeover catastrophe. Future LLMs may be similarly incapable due to fundamental limitations of deep learning algorithms. However, divided expert opinion on deep learning and surprise capabilities found in current LLMs suggests some risk of takeover catastrophe from future LLMs. LLM governance should monitor for changes in takeover characteristics and be prepared to proceed more aggressively if warning signs emerge. Unless and until such signs emerge, more aggressive governance measures may be unwarranted.
- Conference Article
- 10.1145/3711875.3729128
- Jun 23, 2025
While large language models (LLMs) are endowed with broad knowledge, their task-specific performance is often suboptimal. Fine-tuning LLMs with task-specific data from diverse nodes is necessary, but this data is typically safeguarded and not shared publicly due to privacy concerns. A common solution involves downstream nodes downloading the LLM locally and fine-tuning it with their proprietary data. However, owners often regard pre-trained LLMs as valuable assets and are reluctant to share them. Additionally, the significant computational resources required by LLMs make local fine-tuning impractical for many nodes. To mitigate these problems, this paper proposes CrossLM, a data-free collaborative fine-tuning framework for large and small language models. CrossLM enables resource-constrained nodes to train smaller language models (SLMs) using their private task-specific data. These SLMs are subsequently leveraged to promote the task-specific natural language generation and understanding capabilities of the LLMs. Simultaneously, the SLMs of nodes also benefit from enhancement by the fine-tuned LLMs. In this way, CrossLM avoids sharing private data and proprietary LLMs, and also reduces the resource requirements of nodes. Through extensive experiments across a range of benchmark tasks and popular language models, we demonstrate that CrossLM significantly boosts the task-specific performance of both LLMs and SLMs while preserving the generalization capabilities of LLMs.
- Front Matter
1
- 10.3389/frai.2024.1516832
- Nov 29, 2024
- Frontiers in artificial intelligence
In today’s rapidly evolving business landscape, Artificial Intelligence (AI), and specifically Large Language Models (LLMs), are redefining how organizations operate, make decisions, and engage with customers. AI-driven technologies have become indispensable, providing businesses with powerful tools to streamline operations, derive actionable insights from vast data, and foster more meaningful customer interactions. For business leaders, scholars, and practitioners alike, understanding the transformative potential of AI isn’t just advantageous—it’s essential to staying competitive in an increasingly data-driven world.This editorial delves into recent scholarly advancements in LLM applications within business contexts, analyzing studies that explore AI’s potential across various domains, from decision support to creative industries. By introducing a structured framework, this editorial highlights key insights and contributions from recent studies, assessing their value to academia and industry. The following comparative analysis sheds light on how these innovations shape our understanding of AI’s role in business while pointing to future research directions.Puyt and Madsen's (2024) study stands out as a foundational exploration of LLM accuracy, assessing ChatGPT-4's ability to recount the history of the SWOT analysis-a vital business strategy tool. Their findings reveal that, while ChatGPT-4 effectively conveys general concepts, it struggles with detailed historical information, often producing inaccuracies or "hallucinations." This gap underscores the need for LLMs to be trained with verified academic data, particularly for strategic business applications that demand precision. This study not only contributes to the literature by proposing methods to evaluate AI accuracy in historical contexts but also highlights the importance of rigorous information vetting in industry settings where reliability is crucial.In contrast, Raikov et al. (2024) explore a hybrid intelligence model that combines LLM capabilities with explainable AI (XAI) principles to enhance human-machine collaboration. Their approach emphasizes cognitive semantics, improving transparency and decision-making efficiency. The hybrid model's real-time adaptability addresses the needs of complex, regulated industries such as finance and healthcare, where trust in AI decisions is paramount. Academically, this study provides a valuable addition to XAI literature by demonstrating how LLMs can bridge the gap between AI autonomy and human oversight, making it a model for future human-AI interactions in complex business environments.Another significant study by Mariotti and colleagues (2024) examines the integration of LLMs with enterprise knowledge graphs to enhance data-driven decision-making. By enabling organizations to leverage knowledge graphs for more accurate and scalable data retrieval, this research provides a robust framework for businesses seeking efficient knowledge management systems. The academic contribution here lies in advancing the dialogue between LLMs and knowledge graphs, emphasizing ethical data handling and quality standards essential for industry applications. For enterprises, the study offers practical solutions to achieve streamlined data management, balancing automation with privacy and security. 2024) take a different approach, investigating LLMs' role in creative industries, specifically within fashion design. They introduce a hybrid intelligence model that supports creative processes, allowing AI to complement rather than replace human ingenuity. While LLMs in this field demonstrate potential in automating repetitive design tasks and enhancing customer personalization, the study reveals limitations in AI's ability to handle spatial and stylistic nuances. This study's academic contribution lies in promoting human-AI co-creation, inspiring further research into AI applications across diverse creative sectors, including media and marketing.Collectively, these studies not only illuminate LLMs' transformative potential in business but also highlight critical ethical and operational considerations. Ensuring accuracy, transparency, and data privacy are vital to responsibly integrating AI into business workflows. Future research should focus on enhancing LLM accuracy, refining hybrid intelligence models, and exploring creative AI applications, all while maintaining ethical standards. As LLMs evolve, interdisciplinary collaborations will be essential to harness their full potential, making AI an ethical, effective, and innovative force in the business world.
- Research Article
- 10.1093/bjrai/ubaf010
- Aug 13, 2025
- BJR|Artificial Intelligence
Natural Language Processing (NLP) is a key technique for developing Medical Artificial Intelligence (AI) systems that leverage Electronic Health Record (EHR) data to build diagnostic and prognostic models. NLP enables the conversion of unstructured clinical text into structured data that can be fed into AI algorithms. The emergence of transformer architecture and large language models (LLMs) has led to advances in NLP for various healthcare tasks, such as entity recognition, relation extraction, sentence similarity, text summarization, and question-answering. In this article, we review the major technical innovations that underpin modern NLP models and present state-of-the-art NLP applications that employ LLMs in radiation oncology research. However, it is crucial to recognize that LLMs are prone to hallucinations, biases, and ethical violations, which necessitate rigorous evaluation and validation prior to clinical deployment. As such, we propose a comprehensive framework for assessing the NLP models based on their purpose and clinical fit, technical performance, bias and trust, legal and ethical implications, and quality assurance prior to implementation in clinical radiation oncology. Our article aims to provide guidance and insights for researchers and clinicians who are interested in developing and using NLP models in clinical radiation oncology. Natural Language Processing (NLP) is a key technique for developing Medical Artificial Intelligence (AI) systems that leverage Electronic Health Record (EHR) data to build diagnostic and prognostic models. NLP enables the conversion of unstructured clinical text into structured data that can be fed into AI algorithms. The emergence of transformer architecture and large language models (LLMs) has led to advances in NLP for various healthcare tasks, such as entity recognition, relation extraction, sentence similarity, text summarization, and question-answering. In this article, we review the major technical innovations that underpin modern NLP models and present state-of-the-art NLP applications that employ LLMs in radiation oncology research. However, it is crucial to recognize that LLMs are prone to hallucinations, biases, and ethical violations, which necessitate rigorous evaluation and validation prior to clinical deployment. As such, we propose a comprehensive framework for assessing the NLP models based on their purpose and clinical fit, technical performance, bias and trust, legal and ethical implications, and quality assurance prior to implementation in clinical radiation oncology. Our article aims to provide guidance and insights for researchers and clinicians who are interested in developing and using NLP models in clinical radiation oncology.
- Research Article
2
- 10.34190/eckm.25.1.2482
- Sep 3, 2024
- European Conference on Knowledge Management
As the diversity and complexity of Artificial Intelligence (AI) systems increase, there is a growing need for advanced knowledge representation methods to enhance decision-making capabilities. Existing research indicates a gap between AI and Knowledge Management (KM), emphasizing the necessity of coordinating learning and knowledge creation processes between humans and machines. Despite the widespread use of generative AI, as seen through the growing popularity of conversational AI tools like chatbots powered by Large Language Models in recent years, the absence of a theoretical framework for effectively managing the knowledge they generate could mean missing out on significant opportunities. This work seeks to bridge this gap between KM and IA through an integrated framework that aims to apply KM to support IA chatbot applications, adapted from the Internet of Everything Integrated Knowledge Management Model (IoE IKM Model). The IoE IKM Model’s original goal is to support knowledge creation in IoE applications, but here, we show how it can be adapted to bring KM to the context of AI. We accomplish this by explaining the development process of the IoE IKM Model, identifying shared aspects between IoE and AI general applications, and adapting necessary elements to establish our integrated KM framework tailored for supporting AI chatbot applications. The resulting framework is then discussed, and examples of how it can be applied to enhance human interaction with a chatbot, namely Open AI's ChatGPT. Research has been conducted to demonstrate the advantages of applying AI in KM. However, we aim to take a different approach by showing how KM can contribute to AI applications. We expect this work to be helpful for those whose professional activities may involve the usage of AI systems by providing them with the necessary tools to manage the knowledge generated by these same AI systems and by offering a Knowledge Manager’s perspective on how to boost human-machine interaction.
- Research Article
38
- 10.1007/s10676-024-09777-3
- Jun 18, 2024
- Ethics and Information Technology
This paper analyses the phenomenology and epistemology of chatbots such as ChatGPT and Bard. The computational architecture underpinning these chatbots are large language models (LLMs), which are generative artificial intelligence (AI) systems trained on a massive dataset of text extracted from the Web. We conceptualise these LLMs as multifunctional computational cognitive artifacts, used for various cognitive tasks such as translating, summarizing, answering questions, information-seeking, and much more. Phenomenologically, LLMs can be experienced as a “quasi-other”; when that happens, users anthropomorphise them. For most users, current LLMs are black boxes, i.e., for the most part, they lack data transparency and algorithmic transparency. They can, however, be phenomenologically and informationally transparent, in which case there is an interactional flow. Anthropomorphising and interactional flow can, in some users, create an attitude of (unwarranted) trust towards the output LLMs generate. We conclude this paper by drawing on the epistemology of trust and testimony to examine the epistemic implications of these dimensions. Whilst LLMs generally generate accurate responses, we observe two epistemic pitfalls. Ideally, users should be able to match the level of trust that they place in LLMs to the degree that LLMs are trustworthy. However, both their data and algorithmic opacity and their phenomenological and informational transparency can make it difficult for users to calibrate their trust correctly. The effects of these limitations are twofold: users may adopt unwarranted attitudes of trust towards the outputs of LLMs (which is particularly problematic when LLMs hallucinate), and the trustworthiness of LLMs may be undermined.
- Conference Article
- 10.1109/mlcipr68329.2025.11407321
- Dec 19, 2025
While traditional cybersecurity techniques are increasingly inadequate in dealing with the volume and complexity of modern cyber threats, this research aims to address the growing need for robust cybersecurity measures tailored to the unique challenges of artificial intelligence (AI) systems. A three-step Cyber Threat Intelligence pipeline that uses Large Language Models (LLMs) is proposed, in which web-scale sources are mined based on queries in step 1, emerging threats are detected and classified according to their tactics and techniques in step 2, and visualised in step 3. AI techniques such as LLMs and Retrieval-Augmented Generation are heavily utilised in this research. We evaluate the pipeline with an internally curated dataset aligned to a public framework for attacks on AI systems and report strong accuracy along with practical system integration outcomes.
- Research Article
- 10.1186/s12871-026-03814-y
- Apr 10, 2026
- BMC anesthesiology
Postoperative pain management is a core component of anesthesiology practice, with regional anesthesia playing a key role in multimodal analgesia strategies. Large language model (LLM)-based artificial intelligence (AI) systems are increasingly proposed as clinical decision support tools; however, their ability to integrate critical perioperative context, such as the presence of an existing regional block, remains insufficiently explored. This prospective, observational, comparative study included 144 adult patients undergoing elective abdominal surgery at a tertiary care center, after exclusion of four patients due to severe preoperative or intraoperative complications that significantly altered the planned postoperative analgesia. Patients were grouped according to the presence or absence of a regional block (70 per group). For each patient, anonymized and standardized clinical scenarios were evaluated independently by three LLM-based AI systems (ChatGPT, Gemini, and Copilot) to generate postoperative analgesia recommendations. AI outputs were assessed by blinded anesthesiology experts for opioid recommendation, multimodal analgesia, consideration of regional anesthesia, and overall clinical appropriateness using a 5-point Likert scale. Multivariable logistic and ordinal logistic regression analyses were performed to determine the independent effect of regional block presence, adjusting for relevant clinical covariates. Agreement between AI recommendations and actual clinical practice was evaluated using Cohen's kappa. Regional block presence was not independently associated with opioid recommendations generated by any AI system (all p > 0.05). However, the likelihood of recommending an additional regional block was significantly reduced by ChatGPT (adjusted odds ratio [aOR] 0.02, p < 0.001) and Copilot (aOR 0.15, p = 0.019). Gemini demonstrated complete separation, consistently recommending regional blocks only in patients without an existing block. Multimodal analgesia was universally recommended by ChatGPT and Gemini, precluding regression analysis. Expert evaluation scores were significantly higher in scenarios with an existing regional block across all AI systems. Overall agreement between AI-generated recommendations and real-world clinical decisions was limited. LLM-based AI systems demonstrate partial contextual awareness of regional anesthesia when generating postoperative analgesia recommendations. However, this awareness does not consistently translate into concordance with real-world clinical practice. These findings support the use of AI as an adjunctive decision support tool rather than a substitute for clinician judgment in postoperative pain management.
- Conference Article
1
- 10.1145/3711896.3737419
- Aug 3, 2025
Large Language Models (LLMs) have revolutionized interactions between human and artificial intelligence (AI) systems, demonstrating state-of-the-art performance across various domains, including scientific discovery and hypothesis generation. However, the absence of a comprehensive and systematic evaluation framework for LLM-driven research idea generation hinders a rigorous understanding of their strengths and limitations. To address this gap, we propose IdeaBench, a benchmark system that provides a structured dataset and evaluation framework for standardizing the assessment of research idea generation by LLMs. Our dataset comprises titles and abstracts from 2,374 influential papers across eight research domains, along with their 29,408 referenced works, creating a context-rich environment that mirrors human researchers' ideation processes. By profiling LLMs as domain-specific researchers and grounding them in similar contextual constraints, we directly leverage the models' knowledge learned from the pre-training stage to generate new research ideas. To systematically evaluate LLMs' research ideation capability and approximate human assessment, we propose a reference-based metric that aligns with human judgment to quantify idea quality with the assistance of LLMs. Through this evaluation, we find that while LLMs excel at generating novel ideas, they may struggle with generating feasible ideas. IdeaBench serves as a critical resource for benchmarking and comparing LLMs, ultimately advancing research on AI's role in automating scientific discovery.