Accelerate Literature Icon
Want to do a literature review? Try our new Literature Review workflow

The Epistemological AI Turn: From JTB to KnowledgeS

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon

In this paper, we examine whether large language models (LLMs) can be said to possess knowledge in the sense defined by the Justified True Belief (JTB) framework, and if not, whether any alternative form of knowledge can meaningfully be attributed to them. While LLMs perform impressively across various cognitive tasks—such as summarization, translation, and content generation—they lack belief, justification, and truth-evaluation, which are essential components of the JTB model. We argue that attributing human-like knowledge (in the JTB sense or its variants) to LLMs constitutes a category mistake. Accordingly, LLMs should not be regarded as epistemic agents with human-like capacities, but rather as machine tools that simulate certain functions of human cognition. We acknowledge, however, that when used critically and ethically, these tools can enhance human cognitive performance. To distinguish the capacities of LLMs from human cognitive agency, we introduce the term knowledgeS to denote the structured linguistic outputs produced by LLMs in response to complex cognitive tasks. We refer to the emergence of knowledgeS as marking an “epistemological AI turn.” Finally, we explore the theological implications of AI-generated knowledge. Because LLMs lack conscience and moral sense, they risk detaching knowledge from ethical grounding. Within normative traditions such as Christianity, knowledge is inseparable from moral responsibility rooted in the faith of a religious community. If AI-generated religious texts are mistaken for genuine spiritual insight, they may promote a form of “algorithmic gnosis”—content that mimics sacred language while remaining spiritually hollow. Such developments could erode the moral and spiritual depth of religious expression. As AI systems assume increasingly authoritative roles, society must guard against confusing knowledgeS with genuine, embodied, and ethically accountable knowing, which remains unique to human agency.

Similar Papers
  • Research Article
  • Cite Count Icon 5
  • 10.3390/knowledge5010003
Epistemology in the Age of Large Language Models
  • Feb 1, 2025
  • Knowledge
  • Jennifer Mugleston + 4 more

Epistemology and technology have been working in synergy throughout history. This relationship has culminated in large language models (LLMs). LLMs are rapidly becoming integral parts of our daily lives through smartphones and personal computers, and we are coming to accept the functionality of LLMs as a given. As LLMs become more entrenched in societal functioning, questions have begun to emerge: Are LLMs capable of real understanding? What is knowledge in LLMs? Can knowledge exist independently of a conscious observer? While these questions cannot be answered definitively, we can argue that modern LLMs are more than mere symbol-manipulators and that LLMs in deep neural networks should be considered capable of a form of knowledge, though it may not qualify as justified true belief (JTB) in the traditional definition. This deep neural network design may have endowed LLMs with the capacity for internal representations, basic reasoning, and the performance of seemingly cognitive tasks, possible only through a compressive but generative form of representation that can be best termed as knowledge. In addition, the non-symbolic nature of LLMs renders them incompatible with the criticism posed by Searle’s “Chinese room” argument. These insights encourage us to revisit fundamental questions of epistemology in the age of LLMs, which we believe can advance the field.

  • Research Article
  • Cite Count Icon 2
  • 10.6001/fil-soc.2025.36.1.2
Large Language Models and the Enhancement of Human Cognition: Some Theoretical Insights
  • Mar 3, 2025
  • Filosofija. Sociologija
  • Aistė Diržytė

This essay explores the possible contribution of Large Language Models (LLMs) to human cognition. It investigates whether human cognition can be enhanced by advanced AI systems such as LLMs. Can LLMs make people as learners smarter, or, on the contrary, make them reason/think less? The author discusses the concepts of human and artificial intelligence and examines LLMs as advanced AI systems, which use deep learning techniques and can be considered as excelling in neural network architectures, data volume, generalisation and scalability. The author suggests that while LLMs can assist in facilitating numerous cognitive tasks, more research and philosophical inquiry is needed to understand whether such kind of AI assistance would make people cultivate human intelligence more, and not less. Presumably, Large Language Models (LLMs) can contribute to human intelligence and cognition just under strict (addressed existing limitations, questioning prompting, time-sensitivity, etc.) conditions. However, it is important that these theoretical considerations could be verified by experimental research.

  • Research Article
  • Cite Count Icon 117
  • 10.1001/jamanetworkopen.2023.46721
Performance of Large Language Models on a Neurology Board–Style Examination
  • Dec 7, 2023
  • JAMA network open
  • Marc Cicero Schubert + 2 more

Recent advancements in large language models (LLMs) have shown potential in a wide array of applications, including health care. While LLMs showed heterogeneous results across specialized medical board examinations, the performance of these models in neurology board examinations remains unexplored. To assess the performance of LLMs on neurology board-style examinations. This cross-sectional study was conducted between May 17 and May 31, 2023. The evaluation utilized a question bank approved by the American Board of Psychiatry and Neurology and was validated with a small question cohort by the European Board for Neurology. All questions were categorized into lower-order (recall, understanding) and higher-order (apply, analyze, synthesize) questions based on the Bloom taxonomy for learning and assessment. Performance by LLM ChatGPT versions 3.5 (LLM 1) and 4 (LLM 2) was assessed in relation to overall scores, question type, and topics, along with the confidence level and reproducibility of answers. Overall percentage scores of 2 LLMs. LLM 2 significantly outperformed LLM 1 by correctly answering 1662 of 1956 questions (85.0%) vs 1306 questions (66.8%) for LLM 1. Notably, LLM 2's performance was greater than the mean human score of 73.8%, effectively achieving near-passing and passing grades in the neurology board examination. LLM 2 outperformed human users in behavioral, cognitive, and psychological-related questions and demonstrated superior performance to LLM 1 in 6 categories. Both LLMs performed better on lower-order than higher-order questions, with LLM 2 excelling in both lower-order and higher-order questions. Both models consistently used confident language, even when providing incorrect answers. Reproducible answers of both LLMs were associated with a higher percentage of correct answers than inconsistent answers. Despite the absence of neurology-specific training, LLM 2 demonstrated commendable performance, whereas LLM 1 performed slightly below the human average. While higher-order cognitive tasks were more challenging for both models, LLM 2's results were equivalent to passing grades in specialized neurology examinations. These findings suggest that LLMs could have significant applications in clinical neurology and health care with further refinements.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 2
  • 10.1038/s41598-025-22290-x
Judgments of learning distinguish humans from large language models in predicting memory
  • Oct 7, 2025
  • Scientific Reports
  • Markus Huff + 1 more

Large language models (LLMs) increasingly mimic human cognition in various language-based tasks. However, their capacity for metacognition—particularly in predicting memory performance—remains unexplored. Here, we introduce a cross-agent prediction model to assess whether ChatGPT-based LLMs align with human judgments of learning (JOL), a metacognitive measure where individuals predict their own future memory performance. We tested humans and LLMs on pairs of sentences, one of which was a garden-path sentence—a sentence that initially misleads the reader toward an incorrect interpretation before requiring reanalysis. By manipulating contextual fit (fitting vs. unfitting sentences), we probed how intrinsic cues (i.e., relatedness) affect both LLM and human JOL. Our results revealed that while human JOL reliably predicted actual memory performance, none of the tested LLMs (GPT-3.5-turbo, GPT-4-turbo, and GPT-4o) demonstrated comparable predictive accuracy. This discrepancy emerged regardless of whether sentences appeared in fitting or unfitting contexts. These findings indicate that, despite LLMs’ demonstrated capacity to model human cognition at the object-level, they struggle at the meta-level, failing to capture the variability in individual memory predictions. By identifying this shortcoming, our study underscores the need for further refinements in LLMs’ self-monitoring abilities, which could enhance their utility in educational settings, personalized learning, and human–AI interactions. Strengthening LLMs’ metacognitive performance may reduce the reliance on human oversight, paving the way for more autonomous and seamless integration of AI into tasks requiring deeper cognitive awareness.

  • Research Article
  • 10.1016/j.array.2026.100775
A systematic literature review of large language models in phishing attack generation and detection
  • Jul 1, 2026
  • Array
  • Dinushan Sivaneswaran + 5 more

Phishing attacks continue to grow in scale and sophistication, causing substantial financial losses and privacy breaches worldwide. Recent advances in large language models (LLMs) have brought significant changes to the generation and detection of phishing content. This study systematically investigates the dual role of LLMs in facilitating phishing attacks and strengthening countermeasures. Using the PRISMA methodology, authors screened 142 records published between January 2023 and April 2025 and identified 36 eligible studies from major academic databases, including IEEE Xplore, ScienceDirect, ACM Digital Library, Web of Science, and Scopus. A comprehensive and rigorous analysis was conducted of research trends/themes over time, dataset characteristics, and the LLM architectures/models employed. The findings reveal that most studies relied on manually generated datasets rather than publicly available benchmark datasets, and that GPT-based models received considerably more attention than other LLM architectures. The review demonstrates that LLMs substantially enhance the generation of phishing content by producing coherent, contextually relevant, and persuasive email and website content. This capability lowers the technical barrier for attackers and potentially increases attack effectiveness. Conversely, LLMs also strengthen defensive strategies by enabling more effective analysis of textual and visual content for phishing detection. In many cases, LLM-based approaches outperform traditional machine learning and deep learning methods and, in certain contexts, approach or match human-level performance. Overall, the findings suggest that LLMs have accelerated and automated phishing-related processes, simultaneously intensifying the threat landscape and advancing defensive capabilities. • The first in-depth study to review LLMs usage in Phishing attack generation and detection. • The study reveals LLMs have accelerated and automated phishing-related processes, elevating both threats and defence mechanisms. • GenAI-based multimodal phishing attacks are on the rise due to the wider adoption of GenAI tools in general.

  • Research Article
  • 10.55041/ijsrem36608
Exploring Vulnerabilities and Threats in Large Language Models: Safeguarding Against Exploitation and Misuse
  • Aug 10, 2024
  • INTERANTIONAL JOURNAL OF SCIENTIFIC RESEARCH IN ENGINEERING AND MANAGEMENT
  • Mr Aarush Varma + 1 more

This research paper delves into the inherent vulnerabilities and potential threats posed by large language models (LLMs), focusing on their implications across diverse applications such as natural language processing and data privacy. The study aims to identify and analyze these risks comprehensively, emphasizing the importance of mitigating strategies to prevent exploitation and misuse in LLM deployments. In recent years, LLMs have revolutionized fields like automated content generation, sentiment analysis, and conversational agents, yet their immense capabilities also raise significant security concerns. Vulnerabilities such as bias amplification, adversarial attacks, and unintended data leakage can undermine trust and compromise user privacy. Through a systematic examination of these challenges, this paper proposes safeguarding measures crucial for responsibly harnessing the potential of LLMs while minimizing associated risks. It underscores the necessity of rigorous security protocols, including robust encryption methods, enhanced authentication mechanisms, and continuous monitoring frameworks. Furthermore, the research discusses regulatory implications and ethical considerations surrounding LLM usage, advocating for transparency, accountability, and stakeholder engagement in policy- making and deployment practices. By synthesizing insights from current literature and real-world case studies, this study provides a comprehensive framework for stakeholders—developers, policymakers, and users—to navigate the complex landscape of LLM security effectively. Ultimately, this research aims to inform future advancements in LLM technology, ensuring its safe and beneficial integration into various domains while mitigating potential risks to individuals and society as a whole. Keywords— Adversarial attacks on LLMs, Bias in LLMs, Data privacy in LLMs, Ethical considerations LLMs, Exploitation of LLMs, Large Language Models (LLMs), Misuse of LLMs, Mitigation strategies for LLMs, Natural Language Processing (NLP), Regulatory frameworks LLMs, Responsible deployment of LLMs, Risks of LLMs, Security implications of LLMs, Threats to LLMs, Vulnerabilities in LLMs.

  • Research Article
  • Cite Count Icon 10
  • 10.1089/cyber.2024.0409
Psychomatics-A Multidisciplinary Framework for Understanding Artificial Minds.
  • Jun 30, 2025
  • Cyberpsychology, behavior and social networking
  • Giuseppe Riva + 4 more

Although large language models (LLMs) and other artificial intelligence systems demonstrate cognitive skills similar to humans, such as concept learning and language acquisition, the way they process information fundamentally differs from biological cognition. To better understand these differences, this article introduces Psychomatics, a multidisciplinary framework bridging cognitive science, linguistics, and computer science. It aims to delve deeper into the high-level functioning of LLMs, focusing specifically on how LLMs acquire, learn, remember, and use information to produce their outputs. To achieve this goal, Psychomatics will rely on a comparative methodology, starting from a theory-driven research question-is the process of language development and use different in humans and LLMs?-drawing parallels between LLMs and biological systems. Our analysis shows how LLMs can map and manipulate complex linguistic patterns in their training data. Moreover, LLMs can follow Grice's Cooperative principle to provide relevant and informative responses. However, human cognition draws from multiple sources of meaning, including experiential, emotional, and imaginative facets, which transcend mere language processing and are rooted in our social and developmental trajectories. Moreover, current LLMs lack physical embodiment, reducing their ability to make sense of the intricate interplay between perception, action, and cognition that shapes human understanding and expression. Ultimately, Psychomatics holds the potential to yield transformative insights into the nature of language, cognition, and intelligence, both artificial and biological. Moreover, by drawing parallels between LLMs and human cognitive processes, Psychomatics can inform the development of more robust and human-like artificial intelligence systems.

  • Research Article
  • Cite Count Icon 4
  • 10.1016/j.heliyon.2024.e38911
Challenging large language models’ “intelligence” with human tools: A neuropsychological investigation in Italian language on prefrontal functioning
  • Oct 1, 2024
  • Heliyon
  • Riccardo Loconte + 4 more

The Artificial Intelligence (AI) research community has used ad-hoc benchmarks to measure the “intelligence” level of Large Language Models (LLMs). In humans, intelligence is closely linked to the functional integrity of the prefrontal lobes, which are essential for higher-order cognitive processes. Previous research has found that LLMs struggle with cognitive tasks that rely on these prefrontal functions, highlighting a significant challenge in replicating human-like intelligence. In December 2022, OpenAI released ChatGPT, a new chatbot based on the GPT-3.5 model that quickly gained popularity for its impressive ability to understand and respond to human instructions, suggesting a significant step towards intelligent behaviour in AI. Therefore, to rigorously investigate LLMs' level of “intelligence,” we evaluated the GPT-3.5 and GPT-4 versions through a neuropsychological assessment using tests in the Italian language routinely employed to assess prefrontal functioning in humans. The same tests were also administered to Claude2 and Llama2 to verify whether similar language models perform similarly in prefrontal tests. When using human performance as a reference, GPT-3.5 showed inhomogeneous results on prefrontal tests, with some tests well above average, others in the lower range, and others frankly impaired. Specifically, we have identified poor planning abilities and difficulty in recognising semantic absurdities and understanding others' intentions and mental states. Claude2 exhibited a similar pattern to GPT-3.5, while Llama2 performed poorly in almost all tests. These inconsistent profiles highlight how LLMs' emergent abilities do not yet mimic human cognitive functioning. The sole exception was GPT-4, which performed within the normative range for all the tasks except planning. Furthermore, we showed how standardised neuropsychological batteries developed to assess human cognitive functions may be suitable for challenging LLMs’ performance.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 35
  • 10.1162/opmi_a_00160
The Limitations of Large Language Models for Understanding Human Language and Cognition.
  • Aug 31, 2024
  • Open mind : discoveries in cognitive science
  • Christine Cuskley + 2 more

Researchers have recently argued that the capabilities of Large Language Models (LLMs) can provide new insights into longstanding debates about the role of learning and/or innateness in the development and evolution of human language. Here, we argue on two grounds that LLMs alone tell us very little about human language and cognition in terms of acquisition and evolution. First, any similarities between human language and the output of LLMs are purely functional. Borrowing the "four questions" framework from ethology, we argue that what LLMs do is superficially similar, but how they do it is not. In contrast to the rich multimodal data humans leverage in interactive language learning, LLMs rely on immersive exposure to vastly greater quantities of unimodal text data, with recent multimodal efforts built upon mappings between images and text. Second, turning to functional similarities between human language and LLM output, we show that human linguistic behavior is much broader. LLMs were designed to imitate the very specific behavior of human writing; while they do this impressively, the underlying mechanisms of these models limit their capacities for meaning and naturalistic interaction, and their potential for dealing with the diversity in human language. We conclude by emphasising that LLMs are not theories of language, but tools that may be used to study language, and that can only be effectively applied with specific hypotheses to motivate research.

  • Research Article
  • Cite Count Icon 1
  • 10.1108/mlag-01-2025-0001
Large language models for automated grading in geotechnics
  • Nov 28, 2025
  • Machine Learning and Data Science in Geotechnics
  • Enrico Soranzo

Purpose The purpose of this study is to explore the application of automated grading systems in geotechnics using large language models (LLMs) and cosine similarity for enhanced assessment and educational content generation. By training and testing LLMs on synthetic and real student data, the study seeks to develop robust systems for grading technical reports and open-ended questions, aligned with industry standards. Additionally, it aims to enhance student learning through auto-grading, immediate feedback and content generation, while addressing ethical considerations such as data privacy and fairness. Ultimately, the study strives to demonstrate the potential of LLMs to improve consistency, efficiency and educational outcomes. Design/methodology/approach The study employs a mixed-methods approach to develop and validate automated grading systems in geotechnics. Initially, correct answers were generated manually and synthetically using a generative pre-trained transformer model, with synthetic answers compared to correct ones via cosine similarity. Real student answers underwent similar evaluation. A Web-based tool was created to assess responses in real-time, providing dynamic feedback. Additionally, LLMs were fine-tuned on geotechnics textbooks and validated using synthetic and real student data. Anonymized student project reports were graded automatically, showcasing the potential and limitations of LLMs in consistent grading and educational content generation. Ethical considerations were addressed throughout. Findings The study demonstrated the potential of LLMs in geotechnics education by developing ML-driven systems for grading and content generation. The grading system, using cosine similarity and LLMs, provided consistent and objective assessments comparable to human graders. Immediate feedback on open-ended questions enhanced learning outcomes, enabling students to address knowledge gaps effectively. Fine-tuning LLMs with geotechnics textbooks and industry standards facilitated the generation of accurate, relevant questions and answers, further improved by retrieval-augmented generation (RAG). Data augmentation techniques enhanced model robustness, while ethical considerations, including data privacy, fairness, and transparency, ensured responsible deployment and fostered trust among stakeholders. Originality/value This study offers originality and value by pioneering the application of LLMs and cosine similarity for automated grading in geotechnics education, a domain with limited exploration in educational technology. By integrating RAG and fine-tuning LLMs with domain-specific textbooks, it bridges the gap between advanced machine learning techniques and practical applications in engineering education. The development of real-time feedback tools and robust grading systems enhances both student learning and instructional efficiency. Furthermore, addressing ethical considerations such as fairness and data privacy sets a precedent for responsible artificial intelligence (AI) deployment, contributing to the broader adoption of AI in academia.

  • Research Article
  • Cite Count Icon 1
  • 10.1186/s12912-025-04102-9
Editorial stances on large Language models in leading nursing publications: a cross-sectional analysis
  • Nov 20, 2025
  • BMC Nursing
  • Xing Zhou + 3 more

BackgroundThe rapid integration of large language models (LLMs) into scholarly publishing has created an urgent need for clear standards. This study aims to comprehensively analyze the editorial stances of leading nursing publications regarding the use of LLMs in manuscript preparation and peer assessment.MethodsWe conducted a cross-sectional analysis of the top 50 nursing publications according to their journal impact factor. Each publication’s website was systematically evaluated for directives concerning LLM use in authorship, content generation, image creation, and peer assessment. Journal metrics were also extracted to assess any correlation with policy adoption.ResultsOf the 50 publications, 35 (70%) had explicit LLM-related directives. A strong point of agreement permits the use of LLMs for content generation (97%) but prohibits LLM authorship (94%). However, a significant divergence was found regarding AI-generated images, with 52% of publications prohibiting their use. Guidance on LLM use in peer assessment was also inconsistent, with 49% of publications prohibiting it. Policy adoption varied significantly by publisher (ranging from 20% to 100%). No statistical association was found between policy existence and journal impact metrics (p > 0.05).ConclusionsLeading nursing publications exhibit a fractured landscape on LLM use. While foundational agreement exists on authorship and content generation, critical areas like image creation and peer assessment lack consistent standards. This ambiguity underscores the need for a more unified, transparent framework to guide ethical and responsible LLM integration in nursing scholarship.Supplementary InformationThe online version contains supplementary material available at 10.1186/s12912-025-04102-9.

  • Conference Article
  • Cite Count Icon 6
  • 10.1109/compsac61105.2024.00067
LLMChain: Blockchain-Based Reputation System for Sharing and Evaluating Large Language Models
  • Jul 2, 2024
  • Mouhamed Amine Bouchiha + 6 more

Large Language Models (LLMs) have witnessed a rapid growth in emerging challenges and capabilities of language understanding, generation, and reasoning. Despite their remarkable performance in natural language processing-based applications, LLMs are susceptible to undesirable and erratic behaviors, including hallucinations, unreliable reasoning, and the generation of harmful content. These flawed behaviors under-mine trust in LLMs and pose significant hurdles to their adoption in real-world applications, such as legal assistance and medical diagnosis, where precision, reliability, and ethical considerations are paramount. These could also lead to user dissatisfaction, which is currently inadequately assessed and captured. Therefore, to effectively and transparently assess users' satisfaction and trust in their interactions with LLMs, we design and develop LLMChain, a decentralized blockchain-based reputation system that combines automatic evaluation with human feedback to assign contextual reputation scores that accurately reflect LLM's behavior. LLMChain helps users and entities identify the most trustworthy LLM for their specific needs and provides LLM developers with valuable information to refine and improve their models. To our knowledge, this is the first time that a blockchain-based distributed framework for sharing and evaluating LLMs has been introduced. Implemented using emerging tools, LLMChain is evaluated across two benchmark datasets, showcasing its effectiveness and scalability in assessing seven different LLMs.

  • Research Article
  • Cite Count Icon 122
  • 10.1109/jiot.2024.3524255
EdgeShard: Efficient LLM Inference via Collaborative Edge Computing
  • May 15, 2025
  • IEEE Internet of Things Journal
  • Mingjin Zhang + 4 more

Large language models (LLMs) have shown great success in content generation and intelligent intelligent decision making for IoT systems. Traditionally, LLMs are deployed on the cloud, incurring prolonged latency, high bandwidth costs, and privacy concerns. More recently, edge computing has been considered promising in addressing such concerns because the edge devices are closer to data sources. However, edge devices are cursed by their limited resources and can hardly afford LLMs. Existing studies address such a limitation by offloading heavy workloads from edge to cloud or compressing LLMs via model quantization. These methods either still rely heavily on the remote cloud or suffer substantial accuracy loss. This work is the first to deploy LLMs on a collaborative edge computing environment, in which edge devices and cloud servers share resources and collaborate to infer LLMs with high efficiency and no accuracy loss. We design EdgeShard, a novel approach to partition a computation-intensive LLM into affordable shards and deploy them on distributed devices. The partition and distribution are nontrivial, considering device heterogeneity, bandwidth limitations, and model complexity. To this end, we formulate an adaptive joint device selection and model partition problem and design an efficient dynamic programming algorithm to optimize the inference latency and throughput. Extensive experiments of the popular Llama2 serial models on a real-world testbed reveal that EdgeShard achieves up to 50% latency reduction and <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$2 \times $ </tex-math></inline-formula> throughput improvement over the state-of-the-art.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 38
  • 10.1007/s10676-024-09777-3
A phenomenology and epistemology of large language models: transparency, trust, and trustworthiness
  • Jun 18, 2024
  • Ethics and Information Technology
  • Richard Heersmink + 3 more

This paper analyses the phenomenology and epistemology of chatbots such as ChatGPT and Bard. The computational architecture underpinning these chatbots are large language models (LLMs), which are generative artificial intelligence (AI) systems trained on a massive dataset of text extracted from the Web. We conceptualise these LLMs as multifunctional computational cognitive artifacts, used for various cognitive tasks such as translating, summarizing, answering questions, information-seeking, and much more. Phenomenologically, LLMs can be experienced as a “quasi-other”; when that happens, users anthropomorphise them. For most users, current LLMs are black boxes, i.e., for the most part, they lack data transparency and algorithmic transparency. They can, however, be phenomenologically and informationally transparent, in which case there is an interactional flow. Anthropomorphising and interactional flow can, in some users, create an attitude of (unwarranted) trust towards the output LLMs generate. We conclude this paper by drawing on the epistemology of trust and testimony to examine the epistemic implications of these dimensions. Whilst LLMs generally generate accurate responses, we observe two epistemic pitfalls. Ideally, users should be able to match the level of trust that they place in LLMs to the degree that LLMs are trustworthy. However, both their data and algorithmic opacity and their phenomenological and informational transparency can make it difficult for users to calibrate their trust correctly. The effects of these limitations are twofold: users may adopt unwarranted attitudes of trust towards the outputs of LLMs (which is particularly problematic when LLMs hallucinate), and the trustworthiness of LLMs may be undermined.

  • Research Article
  • Cite Count Icon 7
  • 10.1016/j.intell.2024.101858
Evidence of interrelated cognitive-like capabilities in large language models: Indications of artificial general intelligence or achievement?
  • Aug 29, 2024
  • Intelligence
  • David Ilić + 1 more

Large language models (LLMs) are advanced artificial intelligence (AI) systems that can perform a variety of tasks commonly found in human intelligence tests, such as defining words, performing calculations, and engaging in verbal reasoning. There are also substantial individual differences in LLM capacities. Given the consistent observation of a positive manifold and general intelligence factor in human samples, along with group-level factors (e.g., crystallised intelligence), we hypothesized that LLM test scores may also exhibit positive inter-correlations, which could potentially give rise to an artificial general ability (AGA) factor and one or more group-level factors. Based on a sample of 591 LLMs and scores from 12 tests aligned with fluid reasoning (Gf), domain-specific knowledge (Gkn), reading/writing (Grw), and quantitative knowledge (Gq), we found strong empirical evidence for a positive manifold and a general factor of ability. Additionally, we identified a combined Gkn/Grw group-level factor. Finally, the number of LLM parameters correlated positively with both general factor of ability and Gkn/Grw factor scores, although the effects showed diminishing returns. We interpreted our results to suggest that LLMs, like human cognitive abilities, may share a common underlying efficiency in processing information and solving problems, though whether LLMs manifest primarily achievement/expertise rather than intelligence remains to be determined. Finally, while models with greater numbers of parameters exhibit greater general cognitive-like abilities, akin to the connection between greater neuronal density and human general intelligence, other characteristics must also be involved.

Save Icon
Up Arrow
Open/Close
Notes

Save Important notes in documents

Highlight text to save as a note, or write notes directly

You can also access these Documents in Paperpal, our AI writing tool

Powered by our AI Writing Assistant