The radiologist as a physician - artificial intelligence as a way toovercome tension between the patient, technology, and referring physicians - a narrative review.
Large volumes of data increasing over time lead to a shortage of radiologists' time. The use of systems based on artificial intelligence (AI) offers opportunities to relieve the burden on radiologists. The AI systems are usually optimized for a radiological area. Radiologists must understand the basic features of its technical function in order to be able to assess the weaknesses and possible errors of the system and use the strengths of the system. This "explainability" creates trust in an AI system and shows its limits. Based on an expanded Medline search for the key words "radiology, artificial intelligence, referring physician interaction, patient interaction, job satisfaction, communication of findings, expectations", subjective additional relevant articles were considered for this narrative review. The use of AI is well advanced, especially in radiology. The programmer should provide the radiologist with clear explanations as to how the system works. All systems on the market have strengths and weaknesses. Some of the optimizations are unintentionally specific, as they are often adapted too precisely to a certain environment that often does not exist in practice - this is known as "overfitting". It should also be noted that there are specific weak points in the systems, so-called "adversarial examples", which lead to fatal misdiagnoses by the AI even though these cannot be visually distinguished from an unremarkable finding by the radiologist. The user must know which diseases the system is trained for, which organ systems are recognized and taken into account by the AI, and, accordingly, which are not properly assessed. This means that the user can and must critically review the results and adjust the findings if necessary. Correctly applied AI can result in a time savings for the radiologist. If he knows how the system works, he only has to spend a short amount of time checking the results. The time saved can be used for communication with patients and referring physicians and thus contribute to higher job satisfaction. Radiology is a constantly evolving specialty with enormous responsibility, as radiologists often make the diagnosis to be treated. AI-supported systems should be used consistently to provide relief and support. Radiologists need to know the strengths, weaknesses, and areas of application of these AI systems in order to save time. The time gained can be used for communication with patients and referring physicians. · Explainable AI systems help to improve workflow and to save time.. · The physician must critically review AI results, under consideration of the limitations of the AI.. · The AI system will only provide useful results if it has been adapted to the data type and data origin.. · The communicating radiologist interested in the patient is important for the visibility of the discipline.. · Stueckle CA, Haage P. The radiologist as a physician - artificial intelligence as a way to overcome tension between the patient, technology, and referring physicians - a narrative review. Fortschr Röntgenstr 2024; 196: 1115 - 1123.
- Preprint Article
- 10.2196/preprints.73124
- Feb 25, 2025
BACKGROUND The field of artificial intelligence (AI) has expanded rapidly in recent years. Generally, AI is viewed as a “black box” since understanding how it came to its presented solution is nearly impossible, which causes mistrust among end-users. This presents a problem, especially when AI is supposed to be implemented in high-stakes decision work environments. An example of such a work environment is the health care system. Additionally, to the general mistrust there are also legal regulations in place in the case of the implementation of AI systems within the health care system. The mistrust and legal regulations create a strong barrier for the widespread implementation of AI methods across the health care sector. To improve trust in the artificial intelligence systems and to fulfill the legal requirements, there has been a need for transparent, interpretable, explainable artificial intelligence systems. Though rather than developing new AI models, many researchers are working on post-hoc explainable artificial intelligence (XAI) systems which could at least provide the legally needed amount of transparency. Nevertheless, to ensure their usability, the created systems must be explainable to the end-user. OBJECTIVE The goal of this systematic review was to identify the number of evaluations done on the usability, user satisfaction, experience and trust of XAI systems in the health care system. We also aimed to find the most used methods for usability/user experience evaluations. METHODS Following the PRISMA 2020 guidelines, we extracted 6.008 references from four databases. After our concluding our screening steps 134 results remained eligible for the systematic review. The publications were categorized into 26 medical, 102 XAI method and 15 evaluation categories. RESULTS 12 of the 15 evaluation categories were user-based. Only 35 of the 134 papers were sorted into user-based evaluation categories. A large portion of the 35 publications used self-designed questionnaires. Only 3 of the 35 presented a User Centered Design-Process. Our hypothesis that XAI is rarely evaluated, let alone developed, in relation to the needs of the end-user was confirmed. CONCLUSIONS We conclude that there is still a strong need for more involvement of the end-user during the development or at least during the evaluation of the created XAI models. Additionally, we recommend the development of a standardized framework to improve the generalizability of XAI methods. If XAI isn’t developed closer to the needs of the end-user, evaluated from the end-user, or at best developed with the users, we expect that the implementation of explainable artificial intelligence in the health care environment will get increasingly hard.
- Book Chapter
27
- 10.1007/978-3-030-04070-3_1
- Jan 1, 2018
The recent advances in computing power coupled with the rapid increases in the quantity of available data has led to a resurgence in the theory and applications of Artificial Intelligence (AI). However, the use of complex AI algorithms like Deep Learning, Random Forests, etc., could result in a lack of transparency to users which is termed as black/opaque box models. Thus, for AI to be trusted and widely used by governments and industries, there is a need for greater transparency through the creation of explainable AI (XAI) systems. In this paper, we introduce the concepts of XAI and give an overview of hybrid systems which employ fuzzy logic systems which can hold great promise for creating trusted and explainable AI systems.
- Research Article
4
- 10.46610/rtaia.2024.v03i01.001
- Mar 26, 2024
- Research & Review: Machine Learning and Cloud Computing
As Artificial Intelligence (AI) systems become more widespread, there is a growing need for transparency to ensure human understanding and oversight. This is where Explainable AI (XAI) comes in to make AI systems more transparent and interpretable. However, developing adequate explanations is still an open research problem. Human-Computer Interaction (HCI) is significant in designing interfaces for explainable AI. This article reviews the HCI techniques that can be used for solvable AI systems. The literature was explored with a focus on papers at the intersection of HCI and XAI. Essential techniques include interactive visualizations, natural language explanations, conversational agents, mixed-initiative systems, and model introspection methods while Explainable AI presents opportunities to improve system transparency, it also comes with risks, especially if the explanations need to be designed carefully. To ensure that explanations are tailored for diverse users, contexts, and AI applications, HCI principles and participatory design approaches can be utilized. Therefore, this article concludes with recommendations for developing human-centred XAI systems, which can be achieved through interdisciplinary collaboration between HCI and AI. As Artificial Intelligence (AI) systems become more common in our daily lives, the need for transparency in these systems is becoming increasingly important. Ensuring that humans clearly understand how AI systems work and can oversee their functioning is crucial. This is where the concept of Explainable AI (XAI) comes in to make AI systems more transparent and interpretable. However, developing adequate explanations for AI systems is still an open research problem. In this context, Human-Computer Interaction (HCI) is significant in designing interfaces for explainable AI. By integrating HCI principles, we can create systems humans understand and operate more efficiently. This article reviews the HCI techniques that can be used for solvable AI systems. The literature was explored with a focus on papers at the intersection of HCI and XAI. The essential methods identified include interactive visualizations, natural language explanations, conversational agents, mixed-initiative systems, and model introspection methods. Each of these techniques has unique advantages and can be used to provide explanations for different types of AI systems. While Explainable AI presents opportunities to improve system transparency, it also comes with risks, especially if the explanations need to be designed carefully. There is a risk of oversimplification, leading to misunderstanding or mistrust of the AI system. It is essential to employ HCI principles and participatory design approaches to ensure that explanations are tailored for diverse users, contexts, and AI applications. By developing human-centred XAI systems, we can ensure that AI systems are transparent, interpretable, and trustworthy. This can be achieved through interdisciplinary collaboration between HCI and AI. The recommendations in this article provide a starting point for designing such systems. In essence, XAI presents a significant opportunity to improve the transparency of AI systems, but it requires careful design and implementation to be effective.
- Research Article
50
- 10.1016/j.fertnstert.2020.10.040
- Nov 1, 2020
- Fertility and Sterility
Predictive modeling in reproductive medicine: Where will the future of artificial intelligence research take us?
- Discussion
11
- 10.1016/s2589-7500(22)00094-2
- Jun 21, 2022
- The Lancet Digital Health
Artificial intelligence to complement rather than replace radiologists in breast screening
- Conference Article
40
- 10.1109/ijcnn48605.2020.9207472
- Jul 1, 2020
In this work we present a formal theoretical framework for assessing and analyzing two classes of malevolent action towards generic Artificial Intelligence (AI) systems. Our results apply to general multi-class classifiers that map from an input space into a decision space, including artificial neural networks used in deep learning applications. Two classes of attacks are considered. The first class involves adversarial examples and concerns the introduction of small perturbations of the input data that cause misclassification. The second class, introduced here for the first time and named stealth attacks, involves small perturbations to the AI system itself. Here the perturbed system produces whatever output is desired by the attacker on a specific small data set, perhaps even a single input, but performs as normal on a validation set (which is unknown to the attacker). We show that in both cases, i.e., in the case of an attack based on adversarial examples and in the case of a stealth attack, the dimensionality of the AI's decision-making space is a major contributor to the AI's susceptibility. For attacks based on adversarial examples, a second crucial parameter is the absence of local concentrations in the data probability distribution, a property known as Smeared Absolute Continuity. According to our findings, robustness to adversarial examples requires either (a) the data distributions in the AI's feature space to have concentrated probability density functions or (b) the dimensionality of the AI's decision variables to be sufficiently small. We also show how to construct stealth attacks on high-dimensional AI systems that are hard to spot unless the validation set is made exponentially large.
- News Article
20
- 10.1016/s2589-7500(19)30011-1
- May 1, 2019
- The Lancet Digital Health
Is the future of medical diagnosis in computer algorithms?
- Research Article
45
- 10.3389/fnins.2022.883385
- Jun 24, 2022
- Frontiers in Neuroscience
Explainable artificial intelligence aims to bring transparency to artificial intelligence (AI) systems by translating, simplifying, and visualizing its decisions. While society remains skeptical about AI systems, studies show that transparent and explainable AI systems can help improve the Human-AI trust relationship. This manuscript presents two studies that assess three AI decision visualization attribution models that manipulate morphological clarity (MC) and two information presentation-order methods to determine each visualization’s impact on the Human-AI trust relationship through increased confidence and cognitive fit (CF). The first study, N = 206 (Avg. age = 37.87 ± 10.51, Male = 123), utilized information presentation methods and visualizations delivered through an online experiment to explore trust in AI by asking participants to complete a visual decision-making task. The second study, N = 19 (24.9 ± 8.3 years old, Male = 10), utilized eye-tracking technology and the same stimuli presentation methods to investigate if cognitive load, inferred through pupillometry measures, mediated the confidence-trust relationship. The results indicate that low MC positively impacts Human-AI trust and that the presentation order of information within an interface in terms of adjacency further influences user trust in AI. We conclude that while adjacency and MC significantly affect cognitive load, cognitive load alone does not mediate the confidence-trust relationship. Our findings interpreted through a combination of CF, situation awareness, and ecological interface design have implications for the design of future AI systems, which may facilitate better collaboration between humans and AI-based decision agents.
- Research Article
- 10.30574/ijsra.2022.7.2.0275
- Dec 30, 2022
- International Journal of Science and Research Archive
The advent of synthetic cognition—defined as the capacity of artificial intelligence (AI) systems to simulate human-like reasoning, learning, and decision-making—has begun to profoundly reshape medical care pathways. From diagnostics and prognosis to personalized treatment planning and robotic surgery, AI-driven tools are no longer peripheral but integral collaborators in clinical environments. This paper adopts a broad-to-narrow analytical framework to critically examine how synthetic cognition is influencing human-machine collaboration across the continuum of care. At a broader level, the integration of AI systems into healthcare infrastructures challenges conventional assumptions about medical authority, clinical expertise, and the epistemology of care. AI systems are increasingly capable of real-time data interpretation, pattern recognition, and predictive modeling, contributing to decision-making processes in ways that blur the lines between human judgment and machine output. As AI becomes more embedded in clinical routines, the need to recalibrate the roles and relationships between healthcare professionals and intelligent systems becomes urgent. Narrowing the focus, this study evaluates specific instances of human-AI interaction within care pathways—such as in radiology, oncology, and intensive care—highlighting both the benefits and ethical challenges. It explores the implications for clinical responsibility, trust-building, cognitive delegation, and shared accountability. Special attention is given to the tensions between algorithmic opacity and the need for transparent, explainable AI systems that support human oversight rather than replace it. By engaging with interdisciplinary perspectives from medical ethics, cognitive science, and systems theory, this paper offers a nuanced assessment of how synthetic cognition redefines collaboration in medicine. It ultimately argues for the development of hybrid governance frameworks that enable safe, effective, and ethically aligned human-machine partnerships in healthcare.
- Research Article
1
- 10.60087/jaigs.v4i1.78
- Apr 23, 2024
- Journal of Artificial Intelligence General science (JAIGS) ISSN:3006-4023
This paper explores the feasibility of constructing interpretable artificial intelligence (AI) systems rooted in active inference and the free energy principle. Initially, we offer a concise introduction to active inference, emphasizing its relevance to modeling decision-making, introspection, and the generation of both overt and covert actions. Subsequently, we delve into how active inference can serve as a foundation for designing explainable AI systems. Specifically, it enables us to capture essential aspects of "introspective" processes and generate intelligible models of decision-making mechanisms. We propose an architectural framework for explainable AI systems employing active inference. Central to this framework is an explicit hierarchical generative model that enables the AI system to monitor and elucidate the factors influencing its decisions. Importantly, this model's structure is designed to be understandable and verifiable by human users. We elucidate how this architecture can amalgamate diverse data sources to make informed decisions in a transparent manner, mirroring aspects of human consciousness and introspection. Finally, we examine the implications of our findings for future AI research and discuss potential ethical considerations associated with developing AI systems with (apparent) introspective capabilities.
- Research Article
- 10.60087/jaigs.vol4.issue1.p26
- Apr 23, 2024
- Journal of Artificial Intelligence General science (JAIGS) ISSN:3006-4023
This paper explores the feasibility of constructing interpretable artificial intelligence (AI) systems rooted in active inference and the free energy principle. Initially, we offer a concise introduction to active inference, emphasizing its relevance to modeling decision-making, introspection, and the generation of both overt and covert actions. Subsequently, we delve into how active inference can serve as a foundation for designing explainable AI systems. Specifically, it enables us to capture essential aspects of "introspective" processes and generate intelligible models of decision-making mechanisms. We propose an architectural framework for explainable AI systems employing active inference. Central to this framework is an explicit hierarchical generative model that enables the AI system to monitor and elucidate the factors influencing its decisions. Importantly, this model's structure is designed to be understandable and verifiable by human users. We elucidate how this architecture can amalgamate diverse data sources to make informed decisions in a transparent manner, mirroring aspects of human consciousness and introspection. Finally, we examine the implications of our findings for future AI research and discuss potential ethical considerations associated with developing AI systems with (apparent) introspective capabilities.
- Book Chapter
9
- 10.1007/978-981-15-1465-4_51
- Dec 19, 2019
Several recent studies have shown that artificial intelligence (AI) systems can be malfunctioned by deliberately crafted data entering through the normal route. For example, a well-crafted sticker attached on a traffic sign can lead a self-driving car to misinterpret the meaning of a traffic sign from its original one. Such deliberately crafted data which cause the AI system to misjudge are called adversarial examples. The problem is that current AI systems are not stable enough to defend adversarial examples when an attacker uses them as means to attack an AI system. Therefore, nowadays, many researches on detecting and removing adversarial examples are under way. In this paper, we proposed the use of the deep image prior (DIP) as a defense method against adversarial examples using only the adversarial noisy image. This is in contrast with other neural network based adversarial noise removal methods where many adversarial noisy and true images have to be used for the training of the neural network. Experimental results show the validness of the proposed approach.
- Research Article
140
- 10.1016/j.isci.2020.101515
- Aug 29, 2020
- iScience
SummaryThe recent sale of an artificial intelligence (AI)-generated portrait for $432,000 at Christie's art auction has raised questions about how credit and responsibility should be allocated to individuals involved and how the anthropomorphic perception of the AI system contributed to the artwork's success. Here, we identify natural heterogeneity in the extent to which different people perceive AI as anthropomorphic. We find that differences in the perception of AI anthropomorphicity are associated with different allocations of responsibility to the AI system and credit to different stakeholders involved in art production. We then show that perceptions of AI anthropomorphicity can be manipulated by changing the language used to talk about AI—as a tool versus agent—with consequences for artists and AI practitioners. Our findings shed light on what is at stake when we anthropomorphize AI systems and offer an empirical lens to reason about how to allocate credit and responsibility to human stakeholders.
- Research Article
- 10.1200/jco.2025.43.16_suppl.e13650
- Jun 1, 2025
- Journal of Clinical Oncology
e13650 Background: To evaluate and compare the diagnostic performance of a deep learning-based artificial intelligence (AI) system versus three radiologists in the detection of breast cancer using digital mammography, specifically within the context of Uzbekistan, and to determine if AI can serve as a reliable tool in this setting. Methods: This retrospective study utilized a dataset of mammograms, sourced from Uzbekistan, which were independently assessed by three radiologists and an AI system. The AI model, based on deep neural networks, was designed for automated breast cancer detection. The radiologists’ interpretations and the AI predictions were compared against a reference standard of biopsy results. The primary outcome measures included the area under the receiver operating characteristic curve (AUC), accuracy, and specificity for both the AI system and radiologists. The data underwent rigorous statistical analysis to establish the significance of the observed differences. The model was trained using data from multiple institutions in multiple countries. Results: The AI system demonstrated a significantly higher area under the curve (AUC of 0.89) compared to the average of three radiologists (AUC of 0.82). The AI also showed higher specificity (e.g., 93.0% versus 77.6%), and the recall rate for AI was three times lower than that of radiologists. The AI was more sensitive in detecting cancers with mass, distortion, or asymmetry and better at detecting T1 or node-negative cancers. This result underscores AI's potential to reduce false positives, but also demonstrates that it can detect cancers missed by radiologists. The AI system's performance aligns with other studies showing AI sensitivity to be non-inferior to, or surpassing, radiologists. AI systems can detect more cancers with mass or distortion than radiologists. The statistical analysis showed that the AI system achieved robust accuracy and demonstrated potential as a reliable tool to enhance breast cancer screening outcomes. A study also showed that AI can reduce the number of reads in a screening program by 41.4%. Conclusions: In this study the AI system outperformed the group of radiologists in terms of AUC, specificity, recall rates, and positive predictive value. These findings suggest that deep learning-based AI can significantly improve the detection of breast cancer in mammography and may serve as a valuable tool in the Uzbekistan healthcare setting. Additional studies that include larger, more heterogenous datasets are warranted and it is important to continue researching AI integration, including risk management and real-world follow up of performance. Future studies should examine the impact of AI on screening performance when used by radiologists and assess the value of different models for various conditions.
- Research Article
14
- 10.1111/ajo.13661
- Apr 1, 2023
- Australian and New Zealand Journal of Obstetrics and Gynaecology
Artificial intelligence (AI) is the simulation of human intelligence in machines that are programmed to think and learn like humans. AI has the potential to revolutionise the way that healthcare professionals diagnose, treat, and manage conditions affecting the female reproductive system. Machine learning (ML) is a subset of AI which deals with the development of algorithms and statistical models that enable computers to learn from and make predictions or decisions without being explicitly programmed to do so. Deep learning (DL) is a subfield of ML that utilises neural networks with multiple layers, known as deep neural networks (DNNs), to learn from data. DNNs are inspired by the structure and function of the human brain and are capable of automatically learning high-level features from raw data, such as images, audio and text. DL has been very successful in various applications such as image and speech recognition, natural language processing and computer vision. ML algorithms can be divided into three categories: supervised learning, unsupervised learning, and reinforcement learning. Supervised learning algorithms are trained on a labelled dataset, where the desired output (label) is already known. Unsupervised learning algorithms are trained on an unlabelled dataset and are used to discover patterns or relationships in the data. Reinforcement learning algorithms are trained using a trial-and-error approach, where the agent receives a reward or penalty for its actions. The goal of reinforcement learning is to learn a policy that maximises the expected reward over time. AI and ML are increasingly being applied in the field of obstetrics and gynaecology, with the potential to improve diagnostic accuracy, patient outcomes, and efficiency of care. AI has been applied to the field of medicine for several decades. One of the earliest examples of AI in medicine was the development of MYCIN in the 1970s, a computer program that could diagnose bacterial infections and recommend appropriate antibiotic treatments. MYCIN was developed by a team at Stanford University led by Edward Shortliffe, and its success demonstrated the potential of AI in medical decision making. In the 1980s, AI-based expert systems such as DXplain, developed at Massachusetts General Hospital, were used to assist in the diagnosis of diseases. These early AI systems were based on rule-based systems and were limited in their capabilities. One of the earliest examples of AI was the development of computer-aided diagnostic systems for ultrasound images in the 1970s and 1980s. These systems were designed to assist radiologists in identifying fetal anomalies and other conditions. In recent years, there has been a renewed interest in the use of AI in obstetrics and gynaecology, driven by advances in ML and the availability of large amounts of data. One of the primary areas in which AI and ML are being used in obstetrics and gynaecology is in the analysis of imaging data, such as ultrasound and magnetic resonance imaging. AI algorithms can be trained to automatically identify and classify different structures in the images, such as the placenta or fetal organs, with high accuracy. Another area of focus is the use of AI to predict preterm birth. Researchers have used ML algorithms to analyse data from electronic health records and identify patterns that are associated with preterm birth. By analysing large datasets of patient information and outcomes, AI algorithms can identify patterns and risk factors that may not be apparent to human analysts. This can help to improve the prediction of obstetric outcomes and guide clinical decision making. In recent years, AI has also been applied in obstetrics and gynaecology for real-time monitoring of high-risk pregnancies and identifying fetal distress. These systems use ML algorithms to analyse data from fetal heart rate monitors and identify patterns that are associated with fetal distress. AI and ML are also being used to develop new tools for the management of gynaecological conditions, such as endometriosis and fibroids. These tools can be used to predict the progression of the disease and guide treatment decisions. One example of the use of AI in benign gynaecology is the development of computer-aided diagnostic systems for endometriosis. These systems use ML algorithms to analyse images of the pelvic region and identify the presence of endometrial tissue, which can be a sign of endometriosis. Another area where AI and ML are being applied is in the management of fibroids. ML algorithms are being used to analyse imaging data and predict the growth and behaviour of fibroids, which can aid in the development of personalised treatment plans. In the field of oncology, AI is being used to improve the accuracy and speed of cancer diagnosis. AI algorithms can analyse images of tissue samples to identify the presence of cancer cells and predict the likelihood of a positive outcome following treatment. AI algorithms can be trained to analyse images from pelvic scans and identify signs of ovarian cancer with high accuracy. In addition to these specific applications, AI and ML are also being used to improve the efficiency and organisation of care in obstetrics and gynaecology. For example, by analysing large amounts of clinical data, AI algorithms can be used to identify patients at high risk of complications, prioritise them for care and ensure that they receive the appropriate level of care in a timely manner. AI and ML have the potential to revolutionise the field of fertility and in vitro fertilisation (IVF). By using data from large patient populations, AI and ML algorithms can help identify patterns and predict outcomes that would be difficult for human experts to discern. This can lead to improvements in diagnosis, treatment planning, and overall success rates for patients undergoing IVF. One area where AI and ML are being applied is in the selection of embryos for transfer during IVF. By analysing images of embryos, AI and ML algorithms can predict which embryos are most likely to result in a successful pregnancy. Another area where AI and ML have shown potential is in the optimisation of culture conditions for embryos. This has the potential to improve the survival and development of embryos, leading to higher pregnancy rates. AI and ML are also being used to improve the timing of embryo transfer during IVF. By analysing data from patient medical histories, AI and ML algorithms can predict the optimal time for transfer to increase the chances of successful pregnancies. In addition to these applications, AI and ML are being used in other areas of fertility and IVF to improve patient outcomes. For example, AI and ML are being used to predict the likelihood of ovarian reserve, predict ovulation timing, and improve the efficiency and cost-effectiveness of fertility clinics. AI and ML are rapidly evolving fields that have the potential to revolutionise the field of surgery. These technologies can be used to assist surgeons in a variety of ways, from pre-operative planning to real-time guidance during procedures. One of the key areas where AI and ML are being applied in surgery is in image analysis. For example, algorithms can be used to automatically segment and identify structures in medical images, such as tumours or blood vessels. This can help surgeons plan procedures more accurately and reduce the risk of complications. Another area where AI and ML are being used in surgery is in the development of robotic systems. These systems can be programmed to perform specific tasks, such as suturing or cutting tissue, with a high degree of precision and accuracy. In addition, robotic systems can be equipped with sensors that provide real-time feedback to the surgeon, which can help to improve the outcome of the procedure. These systems can be programmed with advanced algorithms that allow them to make precise incisions, control bleeding, and minimise tissue damage. AI and ML can also be used to improve the efficiency and safety of surgical procedures. For example, algorithms can be trained to analyse data from vital signs monitors, such as heart rate and blood pressure, and alert surgeons to potential complications in real-time. AI and ML are also being used to assist with post-operative care. For example, algorithms can be used to analyse patient data and predict which patients are at risk of complications, such as infection or bleeding, allowing surgeons to take preventative measures. Overall, AI and ML have the potential to significantly improve the field of surgery by increasing accuracy and precision, reducing the risk of complications, and improving patient outcomes. As the technology continues to advance, it is likely that we will see an increasing number of AI-assisted surgical systems and applications in clinical practice. In gynaecology specifically, there is a scarcity of data and diversity in the data. This can lead to AI models that are not generalisable to certain populations or that make incorrect predictions for certain groups of patients. Overall, AI has the potential to improve the diagnosis and management of obstetrics and gynaecology conditions, and many studies have shown that AI systems can perform at least as well as human experts in several areas. However, it is important to note that AI and ML are still in the early stages of development in obstetrics and gynaecology and more research is needed to fully understand their potential benefits and limitations. Some of the key challenges facing the field include developing AI systems that can explain their decisions, improving the robustness of AI systems to adversarial attacks, and developing AI systems that can operate in a wide range of environments. However, it is important to note that AI is a complementary tool to the obstetrics and gynaecology specialist and it is not meant to replace human expertise. The preceding text is entirely a product of an AI system. The preceding review, Artificial Intelligence in Gynaecology: An Overview was composed and written by an evolutionary AI system, ChatGPT (Chat Generative Pre-trained Transformer). ChatGPT is an AI chatbot underpinned by the GPT architecture, an autoregressive language model that uses DL to produce human-like text. The system was trained on a dataset of over 500 GB of text data derived from books, articles, and websites prior to 2021. The system can engage in responsive dialogue, generate computer code, and produce coherent and fluent text.1 ChatGPT was conceived by OpenAI, an AI laboratory based in San Francisco, California, founded by Elon Musk and Sam Altman in 2015. Since its public release on November 30, 2022, the potential for use and misuse has exponentially grown,2 ultimately leading to the prohibition of the utilisation of AI systems by multiple organisations, including schools and universities. Prompted by this interest in AI, the aim of this study was to assess the capacity of ChatGPT to generate a scientific review. In January 2023, a multidisciplinary study group was assembled to develop the study protocol, confirm the methodology and approve the topic. This research was exempt from ethics review under National Health and Medical Research Council guidelines.3 ChatGPT was instructed to generate an narrative review based on dialogue with the lead author, AY. The input was informed by collaborative meetings of the study group over the study period. The study group nominated the topic, 'Artificial Intelligence in Gynaecology', but ChatGPT generated the title, structure and content for this paper. The study group defined the input parameters for ChatGPT and each AI output was reviewed by the authors for consistency and context, informing the next input. The dialogue thus became increasingly specific and refined in each iteration, as the initial general outline was expanded to include specific subheadings, academic language and academic references. The review was finalised from the ChatGPT output through an explicit composition protocol, limiting assembly to cut and paste, deletion to whole sentences (but not words) and conversion to Australian English. No grammatical or syntax correction was performed. The AI output was cross-referenced and verified by the study group. In this study, ChatGPT generated 7112 words in over 15 iterations, including 32 references. The output was restricted to the final review of 1809 words and nine unique references after removing duplicates4 and incorrect references (19). The final paper was submitted for blinded peer review. Thus, this study has demonstrated the capacity of an AI system, such as ChatGPT, to generate a scientific review through human academic instruction. AI is anticipated to expand the boundaries of evidence-based medicine through the potential of comprehensive analysis and summation of scientific publications. However, unlike systematic reviews or meta-analyses governed by explicit methodology, AI systems such as ChatGPT are the product of DL algorithms that are dependent upon the quality of the input to train the AI. Consequently, unlike systematic reviews, AI systems are bound by the bias, breadth, depth and quality of the training material. A dedicated medical AI would therefore be trained on an appropriate data set, such as the National Library of Medicine Medline/PubMed database. However, the volume of data is challenging: in 2022 alone, there were over 33 million citations equating to a dataset of almost 200 Gb for the minimum dataset. In contrast, ChatGPT has no external reference capabilities, such as access to the internet, search engines or any other sources of information outside of its own model. If forced outside of this framework, ChatGPT may generate plausible-sounding but incorrect or nonsensical responses.4 Most notably, pushing the AI to include references leads the system to generate bizarre fabrications.5 Our paper demonstrated that only 28% (9/32) of the references were authentic, although better than the 11% reported in a recent paper.6 In contrast to human writing, AI-generated content is more likely to be of limited depth, contain factual errors, fabricated references and repeat the instructions used to seed the output.7 The latter results in a formulaic language redundancy that all but identifies AI content. The human authors thus echo the conclusion of ChatGPT that AI is a complementary tool to the specialist and not meant to replace human expertise. For the moment. The authors report no conflicts of interest.