Articles published on Turing test
Authors
Select Authors
Journals
Select Journals
Duration
Select Duration
1000 Search results
Sort by Recency
- New
- Research Article
- 10.1016/j.chbah.2025.100209
- Dec 1, 2025
- Computers in Human Behavior: Artificial Humans
- Costanza Cenerini + 4 more
Artistic turing test: The challenge of differentiating human and AI-generated art
- New
- Research Article
- 10.2196/76896
- Nov 25, 2025
- JMIR Formative Research
- Christoph Raphael Buhr + 10 more
BackgroundLarge language models (LLMs) have great potential to improve and make the work of clinicians more efficient. Previous studies have mainly focused on web-based services, such as ChatGPT, often with simulated cases. For the processing of personalized patient data, web-based services have major data protection concerns. Ensuring compliance with data protection and medical device regulations therefore remains a critical challenge for adopting LLMs in clinical settings.ObjectiveThis retrospective single-center study aimed to evaluate locally run LLMs (Gemma 2, Mistral Nemo, and Llama 3) in providing diagnosis and treatment recommendation for real-world outpatient cases in otorhinolaryngology (ORL).MethodsOutpatient cases (n=30) from regular consultation hours and the emergency service at a university hospital ORL outpatient department were randomly selected. Documentation by ORL doctors, including anamnesis and examination results, was passed to the locally run LLMs (Gemma 2, Mistral Nemo, and Llama 3), which were asked to provide diagnostic and treatment strategies. Recommendations of the LLMs and the treating ORL doctors were rated by 3 experienced ORL consultants on a 6-point Likert scale for medical adequacy, conciseness, coherence, and comprehensibility. Moreover, consultants were asked whether the answers pose a risk to the patient’s safety. A modified Turing test was performed to distinguish responses generated by LLMs from those of doctors. Finally, the potential influence of the information generated by the LLMs on the raters’ own diagnosis and treatment opinions was evaluated.ResultsOver all categories, ORL doctors achieved superior (P<.0005) ratings compared to locally run LLMs (Llama 3, Mistral Nemo, and Gemma 2). ORL doctors’ responses were considered hazardous for patients in only 1% of the ratings, whereas recommendations by Llama 3, Gemma 2, and Mistral Nemo were considered hazardous in 54%, 47%, and 32% of cases, respectively. According to the raters, the LLM’s information rarely influenced their judgment, with Mistral Nemo, Gemma 2, and Llama 3 achieving 1%, 3%, and 4% of the ratings, respectively.ConclusionsAlthough locally run LLM models still underperform compared with their web-based counterparts, they achieved respectable results on outpatient treatment in this study. Nevertheless, the retrospective and single-center nature of the study, along with the clinicians’ documentation style, may have introduced bias in favor of human recommendations. In the future, locally run LLMs will help address data protection concerns; however, further refinement and prospective validation are still needed to meet strict medical device requirements. As locally run LLMs continue to evolve, they are likely to become comparably powerful to web-based LLMs and become established as useful tools to support doctors in clinical practice.
- New
- Research Article
- 10.1038/s41598-025-27712-4
- Nov 18, 2025
- Scientific Reports
- Marius S Knorr + 4 more
Action-oriented approaches to cognition which emphasize the constitutive role of sensorimotor patterns for perception are gaining importance for the study of cognitive processes in the human brain as well as for endowing artificial agents with cognitive capabilities. It is still debated whether motor-based action-effect contingencies can be extended to social contexts. Here, we investigate the hypothesis that social sensorimotor contingencies (socSMCs) substantially contribute to successful social interaction, and that endowing an artificial agent with socSMCs could make it an interaction partner evaluated like a human. We studied a variant of a Turing test in which human participants had to decide whether they interacted with an artificial agent or another human. To disguise the true nature of the partner, movements were mapped to standardized avatars who interacted in a virtual environment. Depending on individual traits of the participants and the duration of the interaction, in about 74% of instances participants correctly identified the interaction partner. Subjects were less likely to detect an artificial agent the more they focused on the joint task rather than on the partner. Our results suggest that the subjective experience of physical social interaction to a significant extent accrues from basic sensorimotor patterns.
- Research Article
- 10.1109/mc.2025.3600889
- Nov 1, 2025
- Computer
- Hal Berghel
A Generative AI Perspective on the Turing Test: Passed but Not Vindicated
- Research Article
- 10.1088/1361-6560/ae13ce
- Oct 28, 2025
- Physics in Medicine & Biology
- Shilun Du + 4 more
Objective.Contrast-EnhancedComputedTomography (CECT) is a critical medical imaging modality, yet acquiring and annotating such datasets remains time-consuming. Generative models show potential in augmenting datasets, but existing methods mainly focus on single-organ CECT with small deformations and struggle to generate diverse data with large deformations.We aim to propose a novel biomechanics-guided CECT volume synthesizing model for generating deformed CECT volumes, and evaluate the effectiveness of deformation-augmented CECT datasets for downstream tasks.Approach. First, we develop a biomechanics-guided deformable CECT volume synthesizing framework using deformation as input to aConditionalGenerativeAdversarialNetworks, and using sequential deformations to further generate temporally consistent deformed CECT volumes. Second, we propose a module for transition region generation and contrast adjustment in CECT. Third, we trained the deformable synthesis model on liver and kidney CECT datasets and used it for dataset augmentation. The synthesized CECT volumes fidelity was verified through qualitative and quantitative tests. The augmented dataset's effectiveness was evaluated for downstream tasks, including segmentation and multi-organ deformable image registration.Main Results. For image fidelity, the meanDiceSimilarityCoefficient (DSC) andStructuralSimilarityIndexMeasure for the synthesized CECT volumes continuity are 0.838 and 0.988, higher than the real CT volumes. Our method outperforms existing approaches in comparative experiments. The specificity and sensitivity in radiologist Turing test are 47.5%and 48.0%. Comparison between deformedex vivoporcine liver CT and synthesized CECT shows the model generates realistic deformed CT. In segmentation, model on augmented datasets achieves a mean mAP@50 scores of 0.641, outperforming 0.399 without augmentation. In deformable image registration, DSC improves by 7%as the augmented training frames increases.Significance. The proposed model can synthesize deformable CECT volumes, augmenting dataset diversity and size. The synthesized CECT volumes reveal good volume continuity and perceptual similarity to real CECT. The augmented datasets can improve the performance for downstream tasks.
- Research Article
- 10.1021/acs.jcim.5c01692
- Oct 17, 2025
- Journal of chemical information and modeling
- Anis Ismail + 2 more
The generation and evaluation of chemical reactions remain challenging with limited comprehensive studies addressing these issues. We introduce the Chemical Reaction (Rxn) Systematic Assessment of Generation and Evaluation (ChemRxnSAGE) framework, an adaptable end-to-end approach for evaluating the quality, validity, and diversity of machine-generated chemical reactions. Combining automated validity filters with quality metrics and expert insights, ChemRxnSAGE systematically eliminates invalid reactions. We tested its robustness using generative models, including Recurrent Neural Networks and Variational Autoencoders, followed by validation using a chemical "Turing test" with domain experts. Additionally, we assess reaction feasibility through thermodynamic analysis and compare the generated reactions against existing literature to ensure relevance and novelty. By combination of computational tools with expert-driven metrics, ChemRxnSAGE offers a comprehensive and extendable solution that advances the state of chemical reaction generation and evaluation.
- Research Article
- 10.47852/bonviewmedin52026362
- Oct 14, 2025
- Medinformatics
- Victor Chigbundu Nwaiwu + 1 more
Today, artificial intelligence (AI) is one of the hottest buzzwords in technology. It is at the center of the global technological revolution, envisaged to replace or enhance human capabilities in the coming times. With AI projected to be one of the major disrupting forces in the future, this article engages with several scientific sources to highlight the step-by-step progress made since the inception of AI from the Turing test to the much-celebrated ChatGPT’s (generative pre-trained transformer) launch, evolution in medical imaging (from early X-ray techniques to sophisticated AI-driven systems), and current research landscape, examining how AI gain can revolutionize radiology practice, while also pointing out pitfalls and future research directions. AI was found to be very useful across every aspect of the radiology work chain (diagnostic and therapeutic components all encompassing), such as scheduling and worklist management, image segmentation and classification, diagnosis, image measurement and assessment, image acquisition and reconstruction, and prediction. However, ongoing concerns were seen around cost, hardware limitations, data quality and quantity, bias, data privacy, training of users, transparency, and regulatory oversight. Several recommendations were then made to include extensive model training on large, diverse datasets/validation, creative research to address the black box phenomenon, AI integration with both virtual and augmented reality to improve models’ robustness, regular user trainings and interdisciplinary collaborations, and developing regulatory frameworks (on data governance, transparency, cybersecurity, ethical issues, and post-market surveillance). It is foreseen that concerned authorities, now thoroughly furnished with knowledge on the historical antecedents upon review of this article, will take the necessary action to address these concerns, putting into consideration AI strategy, AI engineering, stakeholders’ engagement, and regulatory/ethical concerns. Received: 4 June 2025 | Revised: 12 August 2025 | Accepted: 12 September 2025 Conflicts of Interest The authors declare that they have no conflicts of interest to this work. Data Availability Statement Data sharing is not applicable to this article as no new data were created or analyzed in this study. Author Contribution Statement Victor Chigbundu Nwaiwu: Conceptualization, Methodology, Software, Formal analysis, Investigation, Resources, Data curation, Writing – original draft, Writing – review & editing, Visualization. Sreemoy Kanti Das: Validation, Resources, Data curation, Writing – review & editing, Supervision, Project administration.
- Research Article
- 10.1016/j.dcm.2025.100915
- Oct 1, 2025
- Discourse, Context & Media
- Otto Segersven + 1 more
Can a machine talk the talk though not climb the rock? A Turing Test on rock climbing
- Research Article
- 10.1111/cogs.70126
- Oct 1, 2025
- Cognitive Science
- Charlotte O Brand + 2 more
Understanding our ideological opponents is crucial for the effective exchange of arguments and the avoidance of escalation, and the reduction of conflict. We operationalize the idea of an “Ideological Turing Test” to measure the accuracy with which people represent the arguments of their ideological opponents. Crucially, this offers a behavioral measure of open‐mindedness which goes beyond mere self‐report. We recruited 200 participants from opposite sides of three topics with potential for polarization in the UK of the early 2020s (1200 participants total). Participants were asked to provide reasons both for and against their position. Their reasons were then rated by participants from the opposite side. Our criteria for “passing” the test was if an argument was agreed with by opponents to the same extent or higher than arguments made by proponents. We found evidence for high levels of mutual understanding across all three topics. We also found that those who passed were more open‐minded toward their opponents, in that they were less likely to rate them as ignorant, immoral, or irrational. Our method provides a behavioral measure of open‐mindedness and ability to mimic counterpartisan perspectives that goes beyond self‐report measures. Our results offer encouragement that, even in highly polarized debates, high levels of mutual understanding persist.
- Research Article
- 10.1038/s41598-025-17188-7
- Sep 2, 2025
- Scientific reports
- Feng Xiao + 1 more
Recent advances in large language models (LLMs) have highlighted their potential to predict human decisions. In two studies, we compared predictions by GPT-3.5 and GPT-4 across 51 scenarios (9,600 responses) against published data from 2,104 human participants within an evolutionary-psychology framework. We further examined our findings with GPT-4o across eight social-group and kinship conditions (1,600 responses). Our results revealed behavioral differences between humans and LLMs' predictions: Humans showed a greater sensitivity to kinship and group size than the LLMs when making life-death decisions. LLMs align closer with humans with a higher risk-seeking preference in financial domains. While human choices followed Prospect theory's value function (risk-averse in gains, risk-seeking in losses), LLMs often predicted reversed patterns. GPT-3.5 matched the average level of human risk preference but showed reversed framing effects; GPT-4 was indiscriminately risk-averse across social contexts. While humans were more risk-seeking in small or kin groups than in large groups, GPT-4o made the opposite predictions. Our results suggest a set of criteria for a psychological version of the Turing Test reflected in framing effects and social context-dependent risk preference involving kinship, group size, social relations, sense of fairness, self-age awareness, public vs. personal properties, and social group-dependent aspiration levels.
- Research Article
- 10.1016/j.compbiomed.2025.110678
- Sep 1, 2025
- Computers in biology and medicine
- Mehmet Demirel + 4 more
EmiNet: Moving bacteria detection on optical endomicroscopy images trained on synthetic data.
- Research Article
- 10.1371/journal.pmen.0000426
- Aug 29, 2025
- PLOS Mental Health
- S Gabe Hatch + 17 more
Correction: When ELIZA meets therapists: A Turing test for the heart and mind
- Abstract
- 10.1192/j.eurpsy.2025.590
- Aug 26, 2025
- European Psychiatry
- E Molchanova
IntroductionThe development of artificial intelligence (AI) has led to significant advancements in various fields, including mental health applications. As AI technologies like ChatGPT continue to evolve, questions have arisen about whether AI can eventually develop a true personality, and what implications this might have for fields such as psychology and psychiatry. Isaac Asimov’s ideas about AI and the Turing test have gained renewed attention, yet these frameworks do not address the core psychological components such as empathy, emotion, and personal interaction—key elements in therapeutic settings.ObjectivesThis article explores whether AI could develop a personality and replace human therapists in psychological counseling and psychiatry. Specifically, it aims to evaluate AI’s current capabilities in providing emotional and psychological support and to address whether AI can evolve to meet therapeutic practice’s deeper, human-centered requirements.MethodsThe analysis is based on reviewing current AI applications in mental health, such as AI-based therapy platforms for post-traumatic stress disorder (PTSD), which provide symptom management tools and promote adaptive coping strategies. These applications were compared to the human therapist’s role, focusing on emotional interaction, empathy, and the therapeutic relationship. Additionally, the philosophical and psychiatric aspects of personality formation in both humans and AI were examined.ResultsAI systems have made progress in simulating therapeutic techniques, providing guidance, and mimicking emotional responses. They can support symptom relief and enhance coping strategies, especially in areas where human therapists are scarce. However, AI’s ability to engage with deeper aspects of the therapeutic process, such as emotional empathy and personal connection, remains limited. AI lacks subjective experience, emotional depth, and self-awareness—essential factors for forming a genuine personality.ConclusionsWhile AI has the potential to augment clinical practice, it cannot replace the human element in therapy. The development of AI-based tools is valuable for symptom management, but psychotherapy is inherently rooted in human connection, intuition, and emotional engagement—qualities AI does not possess. For AI to truly replace human therapists or develop a personality, significant advancements in consciousness and emotional cognition would be required, which remain speculative at this stage. Thus, AI will likely continue to serve as a supportive tool rather than a replacement for human therapists in the foreseeable future.Disclosure of InterestNone Declared
- Research Article
- 10.1136/jme-2025-110885
- Aug 14, 2025
- Journal of medical ethics
- Jonathan Lewis + 1 more
Stem cell-based human embryo models (SCBEMs), generated in vitro from stem cells, currently exist outside the scope of regulatory frameworks that govern in vitro embryo research in most jurisdictions. A widely discussed proposal suggests using a 'Turing test' framework, whereby regulatory oversight is triggered if an SCBEM is found to be 'equivalent' to a human embryo. In this paper, we argue that such a proposal faces two major complications. First, sophisticated laboratory techniques such as trophoblast replacement allow researchers to manipulate normal embryogenesis, obscuring whether a given SCBEM meets embryo-like regulatory thresholds. Second, attempts to assess SCBEMs' developmental potential-especially through non-human analogues-rest on tenuous epistemic assumptions that may not align with human-specific developmental trajectories. Given SCBEMs' potential manipulability and uncertain biological and potentiality benchmarks, we argue that reliance on equivalence-based frameworks alone is highly problematic. We conclude by urging a cautious, flexible approach that recognises both the scientific promise of SCBEMs and the normative need to prevent the circumvention of regulatory safeguards.
- Research Article
- 10.33392/diam.1856
- Aug 11, 2025
- Diametros
- Paweł Łupkowski
This paper aims to present and discuss an argumentation against the Turing test (TT), which we shall call the CCSC (Complete Conversation System Claim). Exemplary arguments of the CCSC type include Lem’s “Space Gramophone,” the “machine equipped with a dictionary” proposed by Shannon and McCarthy, Block’s “Aunt Bubbles,” and Searle’s “Chinese Room” argument. CCSC argumentation is constructed to show that the TT is not properly designed and, consequently, is not a good hallmark of intelligence. Based on the original TT rules reconstruction, I argue that CCSC-type argumentation seems to be aimed at a certain interpretation of the TT, which, as I demonstrate, commits the straw man fallacy. In light of the results presented by Łupkowski and Wiśniewski, I also discuss whether a complete conversation system is theoretically possible.
- Research Article
- 10.1038/s41598-025-14477-z
- Aug 7, 2025
- Scientific reports
- Ha Kyung Jung + 6 more
This study investigated the effects of feature augmentation, which uses generated images with specific imaging features, on the performance of isocitrate dehydrogenase (IDH) mutation prediction models in gliomas. A total of 598 patients were included from our institution (310 training, 152 internal test) and the Cancer Genome Atlas (136 external test). Score-based diffusion models were used to generate T2-weighted, FLAIR, and contrast-enhanced T1-weighted image triplets. Three neuroradiologists independently assessed visual Turing tests and various morphological features. Multivariable logistic regression models were developed using real images, random augmented data, and feature-augmented datasets. While random augmentation yielded models with AUCs comparable to real image-based models, it led to reduced specificity, particularly in the external test set (specificity: 83.2% vs. 73.0%, P = .013). In contrast, feature-augmented models maintained stable diagnostic performance; however, when more than 70% of training images included synthetic T2-FLAIR mismatch signs, AUC decreased in the external test set (AUC: 0.905-0.906 for ≤ 70%; 0.902-0.876 for ≥ 80%). These findings highlight the value of phenotype-specific augmentation for IDH prediction, while emphasizing the need to optimize augmentation proportion to avoid performance degradation.
- Research Article
- 10.1016/j.oret.2025.08.002
- Aug 1, 2025
- Ophthalmology. Retina
- Ruoyu Chen + 6 more
Noninvasive Synthesis of Multiframe Ultra-Widefield Fluorescein Angiography from Color Fundus Photographs.
- Research Article
- 10.1016/j.chbah.2025.100191
- Aug 1, 2025
- Computers in Human Behavior: Artificial Humans
- Wolfgang Wagner + 5 more
Limits of ChatGPT's conversational pragmatics in a Turing test on ethics, commonsense, and cultural sensitivity
- Research Article
- 10.1177/08944393251364293
- Jul 30, 2025
- Social Science Computer Review
- Rafael C Alvarado
This essay introduces and develops the concept of imitative intelligence implied by Turing’s foundational work on machine intelligence and connects it to the current generation AI agents based on large language models. Based on a close reading of Turing’s writings on machine intelligence, from the 1938 paper on the entscheidungsproblem to the 1950 paper on the imitation game, the Turing test is found to be more than an operational convenience; it reflects an implicit theory of the imitative and social nature of intelligence that informs his entire project of intelligent machine design. Moreover, the proported shortcomings of the Turing test—its reliance on language and emphasis on culture—turn out to be foundational to the success of LLM-based AI agents today. It is suggested that the design and regulation these agents is usefully framed by the perspective of the concept of imitative intelligence is inherited by them.
- Research Article
- 10.17680/erciyesiletisim.1631208
- Jul 30, 2025
- Erciyes İletişim Dergisi
- Deniz Kurtyılmaz
In the modern era, language is not just a means of communication but a creative force shaping perception, identity, and power structures. It is central to knowledge production and reflects the structure of the human mind. Alex Garland’s Ex Machina (2014) explores this linguistic paradigm through an AI narrative, offering a profound inquiry into the nature of consciousness. AI narratives make the boundaries of human cognition and the decisive role of language more visible, and Ex Machina stands out as a key example questioning its semantic, syntactic, and pragmatic dimensions. The film reverses the Turing test, illustrating how language negotiates boundaries between humans and AI. Ava, the artificial intelligence, uses language not only for communication but also for manipulation, persuasion, and power regulation. Her discourse reveals that language is more than a tool for transmitting information—it actively shapes social contexts. This study adopts a philosophical analysis approach, closely examining the film’s dialogues, metaphors, and the strategic use of language. Ex Machina ultimately demonstrates that (artificial) intelligence is evaluated not just based on linguistic competence but also on its ability to manipulate and reshape human interactions, redefining the role of language in the relationship between technology and humanity.