Enhancing Robot Task Planning and Execution through Multi-Layer Large Language Models.

  • Abstract
  • Highlights & Summary
  • PDF
  • References
  • Citations
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon
Take notes icon Take Notes

Large language models have found utility in the domain of robot task planning and task decomposition. Nevertheless, the direct application of these models for instructing robots in task execution is not without its challenges. Limitations arise in handling more intricate tasks, encountering difficulties in effective interaction with the environment, and facing constraints in the practical executability of machine control instructions directly generated by such models. In response to these challenges, this research advocates for the implementation of a multi-layer large language model to augment a robot's proficiency in handling complex tasks. The proposed model facilitates a meticulous layer-by-layer decomposition of tasks through the integration of multiple large language models, with the overarching goal of enhancing the accuracy of task planning. Within the task decomposition process, a visual language model is introduced as a sensor for environment perception. The outcomes of this perception process are subsequently assimilated into the large language model, thereby amalgamating the task objectives with environmental information. This integration, in turn, results in the generation of robot motion planning tailored to the specific characteristics of the current environment. Furthermore, to enhance the executability of task planning outputs from the large language model, a semantic alignment method is introduced. This method aligns task planning descriptions with the functional requirements of robot motion, thereby refining the overall compatibility and coherence of the generated instructions. To validate the efficacy of the proposed approach, an experimental platform is established utilizing an intelligent unmanned vehicle. This platform serves as a means to empirically verify the proficiency of the multi-layer large language model in addressing the intricate challenges associated with both robot task planning and execution.

Highlights

  • Grounded in experiential learning and knowledge accumulation, humans demonstrate a remarkable ability to comprehend intricate tasks through simple communication

  • We described these five tasks using natural language and employed the framework proposed in this paper to enable the robot to accomplish navigation tasks

  • The third point is that this work designs a feedback mechanism during the task decomposition process, it solves the problem of aligning the task decomposition with the robot control instructions

Read more Highlights Expand/Collapse icon

Summary

IntroductionExpand/Collapse icon

Grounded in experiential learning and knowledge accumulation, humans demonstrate a remarkable ability to comprehend intricate tasks through simple communication. Exploiting the inherent augmentation capability within large language models enables the decomposition of tasks into multiple subtasks of reduced complexity [1]. This distinctive feature can be harnessed for task planning within robotic systems, leading to more efficient and seamless human–robot interactions. Prevailing studies have predominantly concentrated on tasks characterized by low complexity, such as robotic arm trajectory planning and robotic object handling. While these investigations have contributed significantly to establishing a theoretical framework for large-model-based robot control, they fall short in addressing tasks of elevated complexity. The expansion into more intricate tasks necessitates the robot’s ability to adeptly handle and integrate complex environmental information into the task planning process

MethodsExpand/Collapse icon
FindingsExpand/Collapse icon
DiscussionExpand/Collapse icon
ConclusionExpand/Collapse icon
ReferencesShowing 10 of 20 papers
  • Cite Count Icon 254
  • 10.18653/v1/2023.acl-long.754
Self-Instruct: Aligning Language Models with Self-Generated Instructions
  • Jan 1, 2023
  • Yizhong Wang + 6 more

  • Open Access Icon
  • Cite Count Icon 207
  • 10.1109/icra48891.2023.10160591
Code as Policies: Language Model Programs for Embodied Control
  • May 29, 2023
  • Jacky Liang + 7 more

  • Open Access Icon
  • Cite Count Icon 15
  • 10.1109/iros47612.2022.9982236
Sequence-of-Constraints MPC: Reactive Timing-Optimal Control of Sequential Manipulation
  • Oct 23, 2022
  • Marc Toussaint + 4 more

  • Open Access Icon
  • PDF Download Icon
  • Cite Count Icon 76
  • 10.18653/v1/2020.emnlp-main.373
Unsupervised Commonsense Question Answering with Self-Talk
  • Jan 1, 2020
  • Vered Shwartz + 4 more

  • Open Access Icon
  • PDF Download Icon
  • Cite Count Icon 60
  • 10.18653/v1/n18-1197
Learning with Latent Language
  • Jan 1, 2018
  • Jacob Andreas + 2 more

  • Open Access Icon
  • Cite Count Icon 30
  • 10.1109/icra46639.2022.9811931
StructFormer: Learning Spatial Structure for Language-Guided Semantic Rearrangement of Novel Objects
  • May 23, 2022
  • Weiyu Liu + 3 more

  • Open Access Icon
  • Cite Count Icon 421
  • 10.1109/iccv48922.2021.00180
MDETR - Modulated Detection for End-to-End Multi-Modal Understanding
  • Oct 1, 2021
  • Aishwarya Kamath + 5 more

  • Open Access Icon
  • PDF Download Icon
  • Cite Count Icon 15
  • 10.18653/v1/2021.acl-long.159
PIGLeT: Language Grounding Through Neuro-Symbolic Interaction in a 3D World
  • Jan 1, 2021
  • Rowan Zellers + 6 more

  • Open Access Icon
  • Cite Count Icon 132
  • 10.1146/annurev-control-101119-071628
Robots That Use Language
  • May 3, 2020
  • Annual Review of Control, Robotics, and Autonomous Systems
  • Stefanie Tellex + 3 more

  • Open Access Icon
  • Cite Count Icon 26
  • 10.1145/3568162.3578623
No, to the Right
  • Mar 13, 2023
  • Yuchen Cui + 5 more

CitationsShowing 9 of 9 papers
  • Book Chapter
  • 10.1007/978-981-96-2260-3_47
Scene Graph-Enhanced Embodied Decision Making for Autonomous Object Search
  • Jan 1, 2025
  • Yachao Wang + 4 more

Scene Graph-Enhanced Embodied Decision Making for Autonomous Object Search

  • Research Article
  • 10.3390/electronics14153125
Probing Augmented Intelligent Human–Robot Collaborative Assembly Methods Toward Industry 5.0
  • Aug 5, 2025
  • Electronics
  • Qingwei Nie + 7 more

Facing the demands of Human–Robot Collaborative (HRC) assembly for complex products under Industry 5.0, this paper proposes an intelligent assembly method that integrates Large Language Model (LLM) reasoning with Augmented Reality (AR) interaction. To address issues such as poor visibility, difficulty in knowledge acquisition, and strong decision dependency in the assembly of complex aerospace products within confined spaces, an assembly task model and structured process information are constructed. Combined with a retrieval-augmented generation mechanism, the method realizes knowledge reasoning and optimization suggestion generation. An improved ORB-SLAM2 algorithm is applied to achieve virtual–real mapping and component tracking, further supporting the development of an enhanced visual interaction system. The proposed approach is validated through a typical aerospace electronic cabin assembly task, demonstrating significant improvements in assembly efficiency, quality, and human–robot interaction experience, thus providing effective support for intelligent HRC assembly.

  • Book Chapter
  • 10.1007/978-3-658-46485-1_6
Generative Artificial Intelligence as Driver for Innovation in the Automotive Industry – A Systematic Analysis
  • Jan 1, 2025
  • Laura Bischoff + 1 more

Generative Artificial Intelligence as Driver for Innovation in the Automotive Industry – A Systematic Analysis

  • Open Access Icon
  • Research Article
  • Cite Count Icon 6
  • 10.1109/lra.2024.3471457
ReplanVLM: Replanning Robotic Tasks With Visual Language Models
  • Nov 1, 2024
  • IEEE Robotics and Automation Letters
  • Aoran Mei + 3 more

Large language models (LLMs) have gained increasing popularity in robotic task planning due to their exceptional abilities in text analytics and generation, as well as their broad knowledge of the world. However, they fall short in decoding visual cues. LLMs have limited direct perception of the world, which leads to a deficient grasp of the current state of the world. By contrast, the emergence of visual language models (VLMs) fills this gap by integrating visual perception modules, which can enhance the autonomy of robotic task planning. Despite these advancements, VLMs still face challenges, such as the potential for task execution errors, even when provided with accurate instructions. To address such issues, this paper proposes a ReplanVLM framework for robotic task planning. In this study, we focus on error correction interventions. An internal error correction mechanism and an external error correction mechanism are presented to correct errors under corresponding phases. A replan strategy is developed to replan tasks or correct error codes when task execution fails. Experimental results on real robots and in simulation environments have demonstrated the superiority of the proposed framework, with higher success rates and robust error correction capabilities in open-world tasks. Videos of our experiments are available at https://youtu.be/NPk2pWKazJc.

  • Book Chapter
  • 10.1007/978-3-031-84744-8_17
Large Language Models for Robotics: A Systematic Literature Review on Prompt Engineering
  • Jan 1, 2025
  • Jakob Wolber + 5 more

Large Language Models for Robotics: A Systematic Literature Review on Prompt Engineering

  • Open Access Icon
  • Conference Article
  • 10.1109/iros58592.2024.10802251
EMPOWER: Embodied Multi-role Open-vocabulary Planning with Online Grounding and Execution
  • Oct 14, 2024
  • Francesco Argenziano + 4 more

Task planning for robots in real-life settings presents significant challenges. These challenges stem from three primary issues: the difficulty in identifying grounded sequences of steps to achieve a goal; the lack of a standardized mapping between high-level actions and low-level commands; and the challenge of maintaining low computational overhead given the limited resources of robotic hardware. We introduce EMPOWER, a framework designed for open-vocabulary online grounding and planning for embodied agents aimed at addressing these issues. By leveraging efficient pre-trained foundation models and a multi-role mechanism, EMPOWER demonstrates notable improvements in grounded planning and execution. Quantitative results highlight the effectiveness of our approach, achieving an average success rate of 0.73 across six different real-life scenarios using a TIAGo robot.

  • Research Article
  • 10.1109/jiot.2024.3516729
A Robotic AI Algorithm for Fusing Generative Large Models in Agriculture Internet of Things
  • May 15, 2025
  • IEEE Internet of Things Journal
  • Guangyu Hou + 7 more

A Robotic AI Algorithm for Fusing Generative Large Models in Agriculture Internet of Things

  • New
  • Research Article
  • 10.1016/j.rcim.2025.103176
From insight to autonomous execution: VLM-enhanced embodied agents towards digital twin-assisted human-robot collaborative assembly
  • Apr 1, 2026
  • Robotics and Computer-Integrated Manufacturing
  • Changchun Liu + 5 more

From insight to autonomous execution: VLM-enhanced embodied agents towards digital twin-assisted human-robot collaborative assembly

  • Conference Article
  • 10.1117/12.3064972
Research on the task decision algorithm for unmanned aerial vehicles based on the robot operating system and large language models
  • Apr 18, 2025
  • Yuntong Cai + 3 more

Research on the task decision algorithm for unmanned aerial vehicles based on the robot operating system and large language models

Similar Papers
  • Research Article
  • 10.1038/s41698-025-00916-7
Evaluating the performance of large language & visual-language models in cervical cytology screening
  • May 23, 2025
  • npj Precision Oncology
  • Qi Hong + 15 more

Large language models (LLMs) and large visual-language models (LVLMs) have exhibited near-human levels of knowledge, image comprehension, and reasoning abilities, and their performance has undergone evaluation in some healthcare domains. However, a systematic evaluation of their capabilities in cervical cytology screening has yet to be conducted. Here, we constructed CCBench, a benchmark dataset dedicated to the evaluation of LLMs and LVLMs in cervical cytology screening, and developed a GPT-based semi-automatic evaluation pipeline to assess the performance of six LLMs (GPT-4, Bard, Claude-2.0, LLaMa-2, Qwen-Max, and ERNIE-Bot-4.0) and five LVLMs (GPT-4V, Gemini, LLaVA, Qwen-VL, and ViLT) on this dataset. CCBench comprises 773 question-answer (QA) pairs and 420 visual-question-answer (VQA) triplets, making it the first dataset in cervical cytology to include both QA and VQA data. We found that LLMs and LVLMs demonstrate promising accuracy and specialization in cervical cytology screening. GPT-4 achieved the best performance on the QA dataset, with an accuracy of 70.5% for close-ended questions and average expert evaluation score of 6.9/10 for open-ended questions. On the VQA dataset, Gemini achieved the highest accuracy for close-ended questions at 67.8%, while GPT-4V attained the highest expert evaluation score of 6.1/10 for open-ended questions. Besides, LLMs and LVLMs revealed varying abilities in answering questions across different topics and difficulty levels. However, their performance remains inferior to the expertise exhibited by cytopathology professionals, and the risk of generating misinformation could lead to potential harm. Therefore, substantial improvements are required before these models can be reliably deployed in clinical practice.

  • Research Article
  • Cite Count Icon 13
  • 10.3389/frobt.2023.1221739
Learning to reason over scene graphs: a case study of finetuning GPT-2 into a robot language model for grounded task planning
  • Aug 15, 2023
  • Frontiers in Robotics and AI
  • Georgia Chalvatzaki + 5 more

Long-horizon task planning is essential for the development of intelligent assistive and service robots. In this work, we investigate the applicability of a smaller class of large language models (LLMs), specifically GPT-2, in robotic task planning by learning to decompose tasks into subgoal specifications for a planner to execute sequentially. Our method grounds the input of the LLM on the domain that is represented as a scene graph, enabling it to translate human requests into executable robot plans, thereby learning to reason over long-horizon tasks, as encountered in the ALFRED benchmark. We compare our approach with classical planning and baseline methods to examine the applicability and generalizability of LLM-based planners. Our findings suggest that the knowledge stored in an LLM can be effectively grounded to perform long-horizon task planning, demonstrating the promising potential for the future application of neuro-symbolic planning methods in robotics.

  • Research Article
  • 10.1038/s41598-025-04384-8
Rethinking VLMs and LLMs for image classification
  • Jun 4, 2025
  • Scientific Reports
  • Avi Cooper + 10 more

Visual Language Models (VLMs) are now increasingly being merged with Large Language Models (LLMs) to enable new capabilities, particularly in terms of improved interactivity and open-ended responsiveness. While these are remarkable capabilities, the contribution of LLMs to enhancing the longstanding key problem of classifying an image among a set of choices remains unclear. Through extensive experiments involving seven models, ten visual understanding datasets, and multiple prompt variations per dataset, we find that, for object and scene recognition, VLMs that do not leverage LLMs can achieve better performance than VLMs that do. Yet at the same time, leveraging LLMs can improve performance on tasks requiring reasoning and outside knowledge. In response to these challenges, we propose a pragmatic solution: a lightweight fix involving a relatively small LLM that efficiently routes visual tasks to the most suitable model for the task. The LLM router undergoes training using a dataset constructed from more than 2.5 million examples of pairs of visual task and model accuracy. Our results reveal that this lightweight fix surpasses or matches the accuracy of state-of-the-art alternatives, including GPT-4V and HuggingGPT, while improving cost-effectiveness.

  • Research Article
  • Cite Count Icon 36
  • 10.1609/aaai.v38i19.30150
Visual Adversarial Examples Jailbreak Aligned Large Language Models
  • Mar 24, 2024
  • Proceedings of the AAAI Conference on Artificial Intelligence
  • Xiangyu Qi + 5 more

Warning: this paper contains data, prompts, and model outputs that are offensive in nature. Recently, there has been a surge of interest in integrating vision into Large Language Models (LLMs), exemplified by Visual Language Models (VLMs) such as Flamingo and GPT-4. This paper sheds light on the security and safety implications of this trend. First, we underscore that the continuous and high-dimensional nature of the visual input makes it a weak link against adversarial attacks, representing an expanded attack surface of vision-integrated LLMs. Second, we highlight that the versatility of LLMs also presents visual attackers with a wider array of achievable adversarial objectives, extending the implications of security failures beyond mere misclassification. As an illustration, we present a case study in which we exploit visual adversarial examples to circumvent the safety guardrail of aligned LLMs with integrated vision. Intriguingly, we discover that a single visual adversarial example can universally jailbreak an aligned LLM, compelling it to heed a wide range of harmful instructions (that it otherwise would not) and generate harmful content that transcends the narrow scope of a `few-shot' derogatory corpus initially employed to optimize the adversarial example. Our study underscores the escalating adversarial risks associated with the pursuit of multimodality. Our findings also connect the long-studied adversarial vulnerabilities of neural networks to the nascent field of AI alignment. The presented attack suggests a fundamental adversarial challenge for AI alignment, especially in light of the emerging trend toward multimodality in frontier foundation models.

  • Research Article
  • Cite Count Icon 8
  • 10.1287/ijds.2023.0007
How Can IJDS Authors, Reviewers, and Editors Use (and Misuse) Generative AI?
  • Apr 1, 2023
  • INFORMS Journal on Data Science
  • Galit Shmueli + 7 more

How Can <i>IJDS</i> Authors, Reviewers, and Editors Use (and Misuse) Generative AI?

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 2
  • 10.3389/fnbot.2024.1342786
A framework for neurosymbolic robot action planning using large language models.
  • Jun 4, 2024
  • Frontiers in neurorobotics
  • Alessio Capitanelli + 1 more

Symbolic task planning is a widely used approach to enforce robot autonomy due to its ease of understanding and deployment in engineered robot architectures. However, techniques for symbolic task planning are difficult to scale in real-world, highly dynamic, human-robot collaboration scenarios because of the poor performance in planning domains where action effects may not be immediate, or when frequent re-planning is needed due to changed circumstances in the robot workspace. The validity of plans in the long term, plan length, and planning time could hinder the robot's efficiency and negatively affect the overall human-robot interaction's fluency. We present a framework, which we refer to as Teriyaki, specifically aimed at bridging the gap between symbolic task planning and machine learning approaches. The rationale is training Large Language Models (LLMs), namely GPT-3, into a neurosymbolic task planner compatible with the Planning Domain Definition Language (PDDL), and then leveraging its generative capabilities to overcome a number of limitations inherent to symbolic task planners. Potential benefits include (i) a better scalability in so far as the planning domain complexity increases, since LLMs' response time linearly scales with the combined length of the input and the output, instead of super-linearly as in the case of symbolic task planners, and (ii) the ability to synthesize a plan action-by-action instead of end-to-end, and to make each action available for execution as soon as it is generated instead of waiting for the whole plan to be available, which in turn enables concurrent planning and execution. In the past year, significant efforts have been devoted by the research community to evaluate the overall cognitive capabilities of LLMs, with alternate successes. Instead, with Teriyaki we aim to providing an overall planning performance comparable to traditional planners in specific planning domains, while leveraging LLMs capabilities in other metrics, specifically those related to their short- and mid-term generative capabilities, which are used to build a look-ahead predictive planning model. Preliminary results in selected domains show that our method can: (i) solve 95.5% of problems in a test data set of 1,000 samples; (ii) produce plans up to 13.5% shorter than a traditional symbolic planner; (iii) reduce average overall waiting times for a plan availability by up to 61.4%.

  • Research Article
  • Cite Count Icon 7
  • 10.3390/robotics13080112
The Future of Intelligent Healthcare: A Systematic Analysis and Discussion on the Integration and Impact of Robots Using Large Language Models for Healthcare
  • Jul 23, 2024
  • Robotics
  • Souren Pashangpour + 1 more

The potential use of large language models (LLMs) in healthcare robotics can help address the significant demand put on healthcare systems around the world with respect to an aging demographic and a shortage of healthcare professionals. Even though LLMs have already been integrated into medicine to assist both clinicians and patients, the integration of LLMs within healthcare robots has not yet been explored for clinical settings. In this perspective paper, we investigate the groundbreaking developments in robotics and LLMs to uniquely identify the needed system requirements for designing health-specific LLM-based robots in terms of multi-modal communication through human–robot interactions (HRIs), semantic reasoning, and task planning. Furthermore, we discuss the ethical issues, open challenges, and potential future research directions for this emerging innovative field.

  • Research Article
  • 10.1038/s41598-025-91448-4
Enhancement of long-horizon task planning via active and passive modification in large language models
  • Feb 28, 2025
  • Scientific Reports
  • Kazuki Hori + 2 more

This study proposes a method for generating complex and long-horizon off-line task plans using large language models (LLMs). Although several studies have been conducted in recent years on robot task planning using LLMs, the planning results tend to be simple, consisting of ten or fewer action commands, depending on the task. In the proposed method, the LLM actively collects missing information by asking questions, and the task plan is upgraded with one dialog example. One of the contributions of this study is a Q&A process in which ambiguity judgment is left to the LLM. By sequentially eliminating ambiguities contained in long-horizon tasks through dialogue, our method increases the amount of information included in movement plans. This study aims to further refine action plans obtained from active modification through dialogue by passive modification, and few studies have addressed these issues for long-horizon robot tasks. In our experiments, we define the number of items in the task planning as information for robot task execution, and we demonstrate the effectiveness of the proposed method through dialogue experiments using a cooking task as the subject. And as a result of the experiment, the amount of information could be increased by the proposed method.

  • Research Article
  • 10.1108/ir-02-2025-0074
Large language and vision-language models for robot: safety challenges, mitigation strategies and future directions
  • Jul 29, 2025
  • Industrial Robot: the international journal of robotics research and application
  • Xiangyu Hu + 1 more

Purpose This study aims to explore the integration of large language models (LLMs) and vision-language models (VLMs) in robotics, highlighting their potential benefits and the safety challenges they introduce, including robustness issues, adversarial vulnerabilities, privacy concerns and ethical implications. Design/methodology/approach This survey conducts a comprehensive analysis of the safety risks associated with LLM- and VLM-powered robotic systems. The authors review existing literature, analyze key challenges, evaluate current mitigation strategies and propose future research directions. Findings The study identifies that ensuring the safety of LLM-/VLM-driven robots requires a multi-faceted approach. While current mitigation strategies address certain risks, gaps remain in real-time monitoring, adversarial robustness and ethical safeguards. Originality/value This study offers a structured and comprehensive overview of the safety challenges in LLM-/VLM-driven robotics. It contributes to ongoing discussions by integrating technical, ethical and regulatory perspectives to guide future advancements in safe and responsible artificial intelligence-driven robotics.

  • Conference Article
  • Cite Count Icon 100
  • 10.1145/3510003.3510203
Jigsaw
  • May 21, 2022
  • Naman Jain + 6 more

Large pre-trained language models such as GPT-3 [10], Codex [11], and Google's language model [7] are now capable of generating code from natural language specifications of programmer intent. We view these developments with a mixture of optimism and caution. On the optimistic side, such large language models have the potential to improve productivity by providing an automated AI pair programmer for every programmer in the world. On the cautionary side, since these large language models do not understand program semantics, they offer no guarantees about quality of the suggested code. In this paper, we present an approach to augment these large language models with post-processing steps based on program analysis and synthesis techniques, that understand the syntax and semantics of programs. Further, we show that such techniques can make use of user feedback and improve with usage. We present our experiences from building and evaluating such a tool Jigsaw, targeted at synthesizing code for using Python Pandas API using multi-modal inputs. Our experience suggests that as these large language models evolve for synthesizing code from intent, Jigsaw has an important role to play in improving the accuracy of the systems.

  • Research Article
  • 10.1080/13658816.2025.2577252
Extraction of geoprocessing modeling knowledge from crowdsourced Google Earth Engine scripts by coordinating large and small language models
  • Nov 1, 2025
  • International Journal of Geographical Information Science
  • Anqi Zhao + 7 more

The widespread use of online geoinformation platforms, such as Google Earth Engine (GEE), has produced numerous scripts. Extracting domain knowledge from these crowdsourced scripts supports understanding of geoprocessing workflows. Small Language Models (SLMs) are effective for semantic embedding but struggle with complex code; Large Language Models (LLMs) can summarize scripts, yet lack consistent geoscience terminology to express knowledge. In this paper, we propose Geo-CLASS, a knowledge extraction framework for geospatial analysis scripts that coordinates large and small language models. Specifically, we designed domain-specific schemas and a schema-aware prompt strategy to guide LLMs to generate and associate entity descriptions, and employed SLMs to standardize the outputs by mapping these descriptions to a constructed geoscience knowledge base. Experiments on 237 GEE scripts, selected from 295,943 scripts in total, demonstrated that our framework outperformed LLM baselines, including Llama-3, GPT-3.5 and GPT-4o. In comparison, the proposed framework improved accuracy in recognizing entities and relations by up to 31.9% and 12.0%, respectively. Ablation studies and performance analysis further confirmed the effectiveness of key components and the robustness of the framework. Geo-CLASS has the potential to enable the construction of geoprocessing modeling knowledge graphs, facilitate domain-specific reasoning and advance script generation via Retrieval-Augmented Generation (RAG).

  • Research Article
  • Cite Count Icon 7
  • 10.1016/j.procs.2023.09.086
A Large and Diverse Arabic Corpus for Language Modeling
  • Jan 1, 2023
  • Procedia Computer Science
  • Abbas Raza Ali + 3 more

A Large and Diverse Arabic Corpus for Language Modeling

  • Discussion
  • 10.1111/cogs.13430
Large Language Models: A Historical and Sociocultural Perspective.
  • Mar 1, 2024
  • Cognitive science
  • Eugene Yu Ji

This letter explores the intricate historical and contemporary links between large language models (LLMs) and cognitive science through the lens of information theory, statistical language models, and socioanthropological linguistic theories. The emergence of LLMs highlights the enduring significance of information-based and statistical learning theories in understanding human communication. These theories, initially proposed in the mid-20th century, offered a visionary framework for integrating computational science, social sciences, and humanities, which nonetheless was not fully fulfilled at that time. The subsequent development of sociolinguistics and linguistic anthropology, especially since the 1970s, provided critical perspectives and empirical methods that both challenged and enriched this framework. This letter proposes that two pivotal concepts derived from this development, metapragmatic function and indexicality, offer a fruitful theoretical perspective for integrating the semantic, textual, and pragmatic, contextual dimensions of communication, an amalgamation that contemporary LLMs have yet to fully achieve. The author believes that contemporary cognitive science is at a crucial crossroads, where fostering interdisciplinary dialogues among computational linguistics, social linguistics and linguistic anthropology, and cognitive and social psychology is in particular imperative. Such collaboration is vital to bridge the computational, cognitive, and sociocultural aspects of human communication and human-AI interaction, especially in the era of large language and multimodal models and human-centric Artificial Intelligence (AI).

  • Research Article
  • Cite Count Icon 3
  • 10.1109/embc53108.2024.10782119
High Throughput Phenotyping of Physician Notes with Large Language and Hybrid NLP Models.
  • Jul 15, 2024
  • Annual International Conference of the IEEE Engineering in Medicine and Biology Society. IEEE Engineering in Medicine and Biology Society. Annual International Conference
  • Syed I Munzir + 2 more

Deep phenotyping is the detailed description of patient signs and symptoms using concepts from an ontology. The deep phenotyping of the numerous physician notes in electronic health records requires high throughput methods. Over the past 30 years, progress toward making high-throughput phenotyping feasible. In this study, we demonstrate that a large language model and a hybrid NLP model (combining word vectors with a machine learning classifier) can perform high throughput phenotyping on physician notes with high accuracy. Large language models will likely emerge as the preferred method for high throughput deep phenotyping physician notes.Clinical relevance: Large language models will likely emerge as the dominant method for the high throughput phenotyping of signs and symptoms in physician notes.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 50
  • 10.1038/s41746-024-01024-9
CancerGPT for few shot drug pair synergy prediction using large pretrained language models
  • Feb 19, 2024
  • NPJ Digital Medicine
  • Tianhao Li + 6 more

Large language models (LLMs) have been shown to have significant potential in few-shot learning across various fields, even with minimal training data. However, their ability to generalize to unseen tasks in more complex fields, such as biology and medicine has yet to be fully evaluated. LLMs can offer a promising alternative approach for biological inference, particularly in cases where structured data and sample size are limited, by extracting prior knowledge from text corpora. Here we report our proposed few-shot learning approach, which uses LLMs to predict the synergy of drug pairs in rare tissues that lack structured data and features. Our experiments, which involved seven rare tissues from different cancer types, demonstrate that the LLM-based prediction model achieves significant accuracy with very few or zero samples. Our proposed model, the CancerGPT (with ~ 124M parameters), is comparable to the larger fine-tuned GPT-3 model (with ~ 175B parameters). Our research contributes to tackling drug pair synergy prediction in rare tissues with limited data, and also advancing the use of LLMs for biological and medical inference tasks.

More from: Sensors
  • New
  • Research Article
  • 10.3390/s25216787
Extended State Observer-Based Chattering Free Terminal Sliding-Mode Control of Hydraulic Manipulators
  • Nov 6, 2025
  • Sensors
  • Han Gao + 3 more

  • New
  • Research Article
  • 10.3390/s25216783
Robust 3D Skeletal Joint Fall Detection in Occluded and Rotated Views Using Data Augmentation and Inference–Time Aggregation
  • Nov 6, 2025
  • Sensors
  • Maryem Zobi + 3 more

  • New
  • Research Article
  • 10.3390/s25216797
Adaptive Channel Estimation for Semi-Passive IRS with Optimized Sensor Deployment
  • Nov 6, 2025
  • Sensors
  • Zhiyu Han + 3 more

  • New
  • Research Article
  • 10.3390/s25216784
Research and Experimental Testing of a Remotely Controlled Ankle Rehabilitation Exoskeleton Prototype
  • Nov 6, 2025
  • Sensors
  • Assylbek Ozhiken + 7 more

  • New
  • Research Article
  • 10.3390/s25216785
An Efficient Vision Mamba–Transformer Hybrid Architecture for Abdominal Multi-Organ Image Segmentation
  • Nov 6, 2025
  • Sensors
  • Fang Lu + 3 more

  • New
  • Research Article
  • 10.3390/s25216786
Deep Lidar-Guided Image Deblurring
  • Nov 6, 2025
  • Sensors
  • Ziyao Yi + 3 more

  • New
  • Research Article
  • 10.3390/s25216764
Cointegration Approach for Vibration-Based Misalignment Detection in Rotating Machinery Under Varying Load Conditions
  • Nov 5, 2025
  • Sensors
  • Sylwester Szewczyk + 4 more

  • New
  • Research Article
  • 10.3390/s25216781
Generating Accurate Activity Patterns for Cattle Farm Management Using MCMC Simulation of Multiple-Sensor Data System
  • Nov 5, 2025
  • Sensors
  • Yukie Hashimoto + 4 more

  • New
  • Research Article
  • 10.3390/s25216775
Single- and Multimodal Deep Learning of EEG and EDA Responses to Construction Noise: Performance and Ablation Analyses
  • Nov 5, 2025
  • Sensors
  • Md Samdani Azad + 2 more

  • New
  • Research Article
  • 10.3390/s25216770
FPGA-Parallelized Digital Filtering for Real-Time Linear Envelope Detection of Surface Electromyography Signal on cRIO Embedded System
  • Nov 5, 2025
  • Sensors
  • Abdelouahad Achmamad + 2 more

Save Icon
Up Arrow
Open/Close
  • Ask R Discovery Star icon
  • Chat PDF Star icon
Setting-up Chat
Loading Interface