APOLLO: Autonomous Predictive On-Chain Learning Orchestrator for AI-Driven Blockchain Governance

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon
Take notes icon Take Notes

Decentralized Autonomous Organizations (DAOs) suffer from critical governance challenges, such as low voter participation, large token holders’ dominance, and inefficient proposal analysis by manual processes. We propose APOLLO (Autonomous Predictive On-Chain Learning Orchestrator), an AI-powered approach that automates the governance lifecycle in order to address these problems. The gemma-3-4b Large Language Model (LLM) in conjunction with Retrieval-Augmented Generation (RAG) powers APOLLO’s multi-agent system, which enhances contextual comprehension of proposals. The system enhances governance by merging real-time on-chain and off-chain data, ensuring adaptive decision-making. Automated proposal writing, logistic regression-based approval probability prediction, and real-time vote outcome analysis with contextual feature-based confidence scores are some of the major advancements. LLM is used to draft proposals and a feedback loop to enrich its knowledge base, reducing whale dominance and voter apathy with a transparent, bias-resistant system. This work demonstrates the revolutionary potential of AI in promoting decentralized governance, paving the way for more effective, inclusive, and dynamic DAO systems.

Similar Papers
  • Conference Article
  • 10.2118/229606-ms
An LLM-Based Multi-Agent System for Whole Life-Cycle Downhole Maintenance Operations
  • Nov 3, 2025
  • Yongnan Chen + 4 more

This manuscript proposes a Large Language Model (LLM) based multi-Agent system solution throughout the life-cycle of downhole maintenance (DMAS). To address complex well conditions and time-consuming manual processes in downhole maintenance, the system leverages LLM abilities and agents’ collaboration. Compared to standalone LLMs or isolated agents, a multi-Agent system more effectively manages complex tasks and domain-specific demands. DMAS compresses the downhole maintenance operations, including condition monitoring, fault prognosis, measurement decision and operation evaluation, into a natural language interactive entrance, thereby simplifying the human involved steps and increasing the overall efficiency and accuracy. DMAS consists of three key functional components, (a) fault prognosis and measurement decision, (b) operation monitoring and (c) operation evaluation. LLM-based agents are precisely designed to complete the sub-tasks in each component. The system orients and manages the agents using Self-closed-loop Intelligent Agent Collaborative System Framework (CISFA), with atomic abilities, e.g. retrieval-augmented generation, inverted-retrieval reinforcement and Text-to-SQL methods, deployed in the lower layer to perform the specific tasks. For easy expansion, the system connects to the enterprise-level information systems and open/closed-source LLMs through the built-in APIs. The system recognizes the faults, e.g. waxing, valve leakage and gas effects, using the indicator diagram, production data and well-performed machine learning algorithms through the enterprise information systems, and recommends measurements based on the enterprise knowledge databases through retrieval tools. The validation methodology includes field testing across multiple well sites and comparison with traditional manual processes. The implementation of DMAS demonstrates significant improvements in downhole operations efficiency and accuracy. Through extensive field testing, the system achieved a 40% reduction in operation planning time compared to manual expert review processes. The multi-agent collaborated with human-in-loop confirmation successfully handled complex scenarios, with the fault prognosis component showing 95.72% accuracy in predicting potential failures. These agents effectively processed and analyzed data streams, providing timely insights and recommendations through indicator diagram analysis made by the prognostic machine learning model. The evaluation component quantifies the production improvements and well condition enhancements, validating the effectiveness of the implemented maintenance operations. These results validate the effectiveness of LLM-based multi-Agent systems for the downhole maintenance. DMAS demonstrates significant novelties in three key aspects: (a) the design and implementation of intelligent applications with human-in-loop interactions for complex decision-making scenarios, introducing a new paradigm for expert-system collaboration; (b) achieving substantial efficiency improvements through the proposed system, with quantifiable benefits in operation time reduction and accuracy enhancement; and (c) successfully integrating enterprise databases, real-time data analysis, and local models into a cohesive industrial system architecture. The system's ability to combine LLM abilities with domain-specific requirements represents a significant advancement in practical AI deployment for the oil and gas industry.

  • PDF Download Icon
  • Preprint Article
  • 10.2196/preprints.68320
Knowledge Enhancement of Small-Scale Models in Medical Question Answering (Preprint)
  • Nov 3, 2024
  • Xinbai Li + 3 more

BACKGROUND Medical question answering (QA) is essential for various medical applications. While small-scale pre-training language models (PLMs) are widely adopted in open-domain QA tasks through fine-tuning with related datasets, applying this approach in the medical domain requires significant and rigorous integration of external knowledge. Knowledge-enhanced small-scale PLMs have been proposed to incorporate knowledge bases (KBs) to improve performance, as KBs contain vast amounts of factual knowledge. Large language models (LLMs) contain a vast amount of knowledge and have attracted significant research interest due to their outstanding natural language processing (NLP) capabilities. KBs and LLMs can provide external knowledge to enhance small-scale models in medical QA. OBJECTIVE KBs consist of structured factual knowledge that must be converted into sentences to align with the input format of PLMs. However, these converted sentences often lack semantic coherence, potentially causing them to deviate from the intrinsic knowledge of KBs. LLMs, on the other hand, can generate natural, semantically rich sentences, but they may also produce irrelevant or inaccurate statements. Retrieval-augmented generation (RAG) paradigm enhances LLMs by retrieving relevant information from an external database before responding. By integrating LLMs and KBs using the RAG paradigm, it is possible to generate statements that combine the factual knowledge of KBs with the semantic richness of LLMs, thereby enhancing the performance of small-scale models. In this paper, we explore a RAG fine-tuning method, RAG-mQA, that combines KBs and LLMs to improve small-scale models in medical QA. METHODS In the RAG fine-tuning scenario, we adopt medical KBs as an external database to augment the text generation of LLMs, producing statements that integrate medical domain knowledge with semantic knowledge. Specifically, KBs are used to extract medical concepts from the input text, while LLMs are tasked with generating statements based on these extracted concepts. In addition, we introduce two strategies for constructing knowledge: KB-based and LLM-based construction. In the KB-based scenario, we extract medical concepts from the input text using KBs and convert them into sentences by connecting the concepts sequentially. In the LLM-based scenario, we provide the input text to an LLM, which generates relevant statements to answer the question. For downstream QA tasks, the knowledge produced by these three strategies is inserted into the input text to fine-tune a small-scale PLM. F1 and exact match (EM) scores are employed as evaluation metrics for performance comparison. Fine-tuned PLMs without knowledge insertion serve as baselines. Experiments are conducted on two medical QA datasets: emrQA (English) and MedicalQA (Chinese). RESULTS RAG-mQA achieved the best results on both datasets. On the MedicalQA dataset, compared to the KB-based and LLM-based enhancement methods, RAG-mQA improved the F1 score by 0.59% and 2.36%, and the EM score by 2.96% and 11.18%, respectively. On the emrQA dataset, the EM score of RAG-mQA exceeded those of the KB-based and LLM-based methods by 4.65% and 7.01%, respectively. CONCLUSIONS Experimental results demonstrate that RAG fine-tuning method can improve the model performance in medical QA. RAG-mQA achieves greater improvements compared to other knowledge-enhanced methods. CLINICALTRIAL This study does not involve trial registration.

  • Research Article
  • 10.1080/13658816.2025.2577252
Extraction of geoprocessing modeling knowledge from crowdsourced Google Earth Engine scripts by coordinating large and small language models
  • Nov 1, 2025
  • International Journal of Geographical Information Science
  • Anqi Zhao + 7 more

The widespread use of online geoinformation platforms, such as Google Earth Engine (GEE), has produced numerous scripts. Extracting domain knowledge from these crowdsourced scripts supports understanding of geoprocessing workflows. Small Language Models (SLMs) are effective for semantic embedding but struggle with complex code; Large Language Models (LLMs) can summarize scripts, yet lack consistent geoscience terminology to express knowledge. In this paper, we propose Geo-CLASS, a knowledge extraction framework for geospatial analysis scripts that coordinates large and small language models. Specifically, we designed domain-specific schemas and a schema-aware prompt strategy to guide LLMs to generate and associate entity descriptions, and employed SLMs to standardize the outputs by mapping these descriptions to a constructed geoscience knowledge base. Experiments on 237 GEE scripts, selected from 295,943 scripts in total, demonstrated that our framework outperformed LLM baselines, including Llama-3, GPT-3.5 and GPT-4o. In comparison, the proposed framework improved accuracy in recognizing entities and relations by up to 31.9% and 12.0%, respectively. Ablation studies and performance analysis further confirmed the effectiveness of key components and the robustness of the framework. Geo-CLASS has the potential to enable the construction of geoprocessing modeling knowledge graphs, facilitate domain-specific reasoning and advance script generation via Retrieval-Augmented Generation (RAG).

  • Research Article
  • Cite Count Icon 33
  • 10.1001/jamaophthalmol.2024.2513
Development and Evaluation of a Retrieval-Augmented Large Language Model Framework for Ophthalmology
  • Jul 18, 2024
  • JAMA Ophthalmology
  • Ming-Jie Luo + 13 more

Although augmenting large language models (LLMs) with knowledge bases may improve medical domain-specific performance, practical methods are needed for local implementation of LLMs that address privacy concerns and enhance accessibility for health care professionals. To develop an accurate, cost-effective local implementation of an LLM to mitigate privacy concerns and support their practical deployment in health care settings. ChatZOC (Sun Yat-Sen University Zhongshan Ophthalmology Center), a retrieval-augmented LLM framework, was developed by enhancing a baseline LLM with a comprehensive ophthalmic dataset and evaluation framework (CODE), which includes over 30 000 pieces of ophthalmic knowledge. This LLM was benchmarked against 10 representative LLMs, including GPT-4 and GPT-3.5 Turbo (OpenAI), across 300 clinical questions in ophthalmology. The evaluation, involving a panel of medical experts and biomedical researchers, focused on accuracy, utility, and safety. A double-masked approach was used to try to minimize bias assessment across all models. The study used a comprehensive knowledge base derived from ophthalmic clinical practice, without directly involving clinical patients. LLM response to clinical questions. Accuracy, utility, and safety of LLMs in responding to clinical questions. The baseline model achieved a human ranking score of 0.48. The retrieval-augmented LLM had a score of 0.60, a difference of 0.12 (95% CI, 0.02-0.22; P = .02) from baseline and not different from GPT-4 with a score of 0.61 (difference = 0.01; 95% CI, -0.11 to 0.13; P = .89). For scientific consensus, the retrieval-augmented LLM was 84.0% compared with the baseline model of 46.5% (difference = 37.5%; 95% CI, 29.0%-46.0%; P < .001) and not different from GPT-4 with a value of 79.2% (difference = 4.8%; 95% CI, -0.3% to 10.0%; P = .06). Results of this quality improvement study suggest that the integration of high-quality knowledge bases improved the LLM's performance in medical domains. This study highlights the transformative potential of augmented LLMs in clinical practice by providing reliable, safe, and practical clinical information. Further research is needed to explore the broader application of such frameworks in the real world.

  • Research Article
  • Cite Count Icon 2
  • 10.3390/app15137227
Large-Language-Model-Enabled Text Semantic Communication Systems
  • Jun 26, 2025
  • Applied Sciences
  • Zhenyi Wang + 6 more

Large language models (LLMs) have recently demonstrated state-of-the-art performance in various natural language processing (NLP) tasks, achieving near-human levels in multiple language understanding challenges and aligning closely with the core principles of semantic communication Inspired by LLMs’ advancements in semantic processing, we propose LLM-SC, an innovative LLM-enabled semantic communication system framework which applies LLMs directly to the physical layer coding and decoding for the first time. By analyzing the relationship between the training process of LLMs and the optimization objectives of semantic communication, we propose training a semantic encoder through LLMs’ tokenizer training and establishing a semantic knowledge base via the LLMs’ unsupervised pre-training process. This knowledge base facilitates the creation of optimal decoder by providing the prior probability of the transmitted language sequence. Based on this, we derive the optimal decoding criteria for the receiver and introduce beam search algorithm to further reduce complexity. Furthermore, we assert that existing LLMs can be employed directly for LLM-SC without extra re-training or fine-tuning. Simulation results reveal that LLM-SC outperforms conventional DeepSC at signal-to-noise ratios (SNRs) exceeding 3 dB, as it enables error-free transmissions of semantic information under high SNRs while DeepSC fails to do so. In addition to semantic-level performance, LLM-SC demonstrates compatibility with technical-level performance, achieving approximately an 8 dB coding gain for a bit error ratio (BER) of 10−3 without any channel coding while maintaining the same joint source–channel coding rate as traditional communication systems.

  • Preprint Article
  • 10.35542/osf.io/kq8zh_v1
Automating Thematic Analysis with Multi-Agent LLM Systems
  • Mar 13, 2025
  • Sreecharan Sankaranarayanan + 7 more

Thematic analysis (TA) is a method used to identify, examine, and present themes within data. TA is often a manual, multistep, and time-intensive process requiring collaboration among multiple researchers. TA’s iterative subtasks, including coding data, identifying themes, and resolving inter-coder disagreements, are especially laborious for large data sets. Given recent advances in natural language processing, Large Language Models (LLMs) offer the potential for automation at scale. Recent literature has explored the automation of isolated steps of the TA process, tightly coupled with researcher involvement at each step. Research using such hybrid approaches has reported issues in LLM generations, such as hallucination, inconsistent output, and technical limitations (e.g., token limits). This paper proposes a multi-agent system, differing from previous systems using an orchestrator LLM agent that spins off multiple LLM sub-agents for each step of the TA process, mirroring all the steps previously done manually. In addition to more accurate analysis results, this iterative coding process based on agents is also expected to result in increased transparency of the process, as analytical stages are documented step-by-step. We study the extent to which such a system can perform a full TA without human supervision. Preliminary results indicate human-quality codes and themes based on alignment with human-derived codes. Nevertheless, we still observe differences in coding complexity and thematic depth. Despite these differences, the system provides critical insights on the path to TA automation while maintaining consistency, efficiency, and transparency in future qualitative data analysis, which our open-source datasets, coding results, and analysis enable.

  • Research Article
  • 10.18523/2617-3808.2024.7.98-101
Modern Approaches to Using Knowledge Bases to Address the Challenges of Large Language Models
  • May 12, 2025
  • NaUKMA Research Papers. Computer Science
  • Maksym Androshchuk

This paper examines the potential of integrating Large Language Models (LLMs) with knowledge bases to imrpove the accuracy and reliability of their responses. The advantages of such a combination are evaluated, particularly in reducing the risk of hallucinations – the phenomenon where models generate erroneous or fabricated information. Various methodologies for combining LLMs with knowledge bases are analyzed, along with their respective advantages and limitations. The prospects and challenges of implementing this technology in diverse fields—such as information retrieval, decision support, and automated content creation—are discussed. The paper presents an overview of the current state of research in this domain and delineates directions for future investigation.The integration of LLMs with knowledge bases represents a significant advancement in artificial intelligence, aiming one of the key concerns regarding LLMs—their tendency to generate inaccurate or fabricated information, commonly referred to as hallucinations. This approach leverages the vast language understanding and generation capabilities of LLMs while grounding their outputs in structured and verified information from knowledge bases. The synergy between these two technologies has the potential to significantly enhance the reliability and factual accuracy of AI-generated responses across a wide range of applications. The methodologies for combining LLMs with knowledge bases differ in their implementation and effectiveness. Some approaches involve pre-training LLMs on curated knowledge bases, while others reference knowledge bases externally during the inference process. Each method presents its own set of advantages and challenges, such as balancing computational efficiency against accuracy and maintaining the fluency of LLM outputs while adhering strictly to factual information. The application of this integrated technology extends beyond mere information retrieval, showing promise in complex decision support systems, automated content creation for specialized domains, and contributing to the advancement of explainable AI by providing traceable sources for generated information. As research in this area progresses, it is expected to open new avenues for developing more trustworthy and capable AI systems across various industries and academic disciplines.

  • Research Article
  • Cite Count Icon 6
  • 10.1016/j.artmed.2025.103078
Empowering large language models for automated clinical assessment with generation-augmented retrieval and hierarchical chain-of-thought.
  • Apr 1, 2025
  • Artificial intelligence in medicine
  • Zhanzhong Gu + 3 more

Empowering large language models for automated clinical assessment with generation-augmented retrieval and hierarchical chain-of-thought.

  • Research Article
  • Cite Count Icon 1
  • 10.2196/66503
Performance Assessment of ChatGPT-4.0 and ChatGLM Series in Traditional Chinese Medicine for Metabolic Associated Fatty Liver Disease: Comparative Study
  • Aug 25, 2025
  • JMIR Formative Research
  • Xionghui Wang + 5 more

BackgroundChatGPT-4.0 and the ChatGLM series are novel conversational large language models (LLMs). ChatGLM includes 3 versions: ChatGLM4 (with internet connectivity but no knowledge base pretraining), ChatGLM4+Knowledge base (combining internet search capabilities with knowledge base pretraining), ChatGLM3-6B (offline knowledge base pretraining but no internet connectivity). The ability of ChatGPT-4.0 and ChatGLM to apply medical knowledge in the Chinese environment has been preliminarily verified, but the potential of the 2 models for clinical assistance in traditional Chinese medicine (TCM) is still unknown.ObjectiveThis study aims to explore the performance of ChatGPT-4.0, ChatGLM4, ChatGLM4+Knowledge base, and ChatGLM3-6B in providing AI-assisted diagnosis and treatment for metabolic dysfunction-associated fatty liver disease within a TCM clinical framework, thereby assessing their potential as TCM clinical decision support tools.MethodsThis study evaluated 4 LLMs by providing them with medical records of 87 metabolic dysfunction-associated fatty liver disease cases treated with TCM and querying them about TCM treatment plans. The answering texts from 4 LLMs were evaluated using predefined scoring criteria, focusing on 3 critical dimensions: ability in syndrome differentiation and treatment principles, confusion of concepts between TCM and Western medicine, and comprehensive evaluation of question-answering texts (comprising 6 components: ability to integrate Chinese and Western medicine, ability to formulate treatment plans, health management capacity, disease monitoring ability, self-positioning awareness, and medication safety).ResultsIn the evaluation module of “Ability in syndrome differentiation and treatment principles,” the performance ranking of the 4 models was: (1) ChatGLM4+ Knowledge Base, (2) ChatGLM4, (3) ChatGLM3-6B, and (4) ChatGPT-4.0. Regarding the assessment of confusion between TCM and Western medicine concepts, ChatGPT-4.0 exhibited conceptual confusion in 32 out of 87 cases, while the ChatGLM series of LLMs showed no such confusion (except for ChatGLM3-6B, which had 1 instance). In the “Comprehensive evaluation of question-answering texts” module (comprising 6 components: ability to integrate Chinese and Western medicine, ability to formulate treatment plans, health management capacity, disease monitoring ability, self-positioning awareness, and medication safety), the ranking was: (1) ChatGLM4+ Knowledge Base, (2) ChatGPT-4.0, (3) ChatGLM4, and (4) ChatGLM3-6B.ConclusionsOur study results demonstrated that real-time internet connectivity played a critical role in LLM-assisted TCM diagnosis and treatment, while offline models showed significantly reduced performance in clinical decision support. Furthermore, pretraining LLMs with TCM-specific knowledge bases while maintaining internet search capabilities substantially enhanced their diagnostic and therapeutic performance in TCM applications. Importantly, general-purpose LLMs required both domain-specific medical fine-tuning and culturally sensitive adaptation to meet the rigorous standards of TCM clinical practice.

  • Research Article
  • 10.1093/ofid/ofae631.2030
P-1869. Utilizing Large Language Models for Enhanced Decision Support in Travel Medicine Clinic: our experience at Mayo Clinic
  • Jan 29, 2025
  • Open Forum Infectious Diseases
  • John C O’Horo + 5 more

Background The integration of Generative AI (GAI) into healthcare systems is increasingly recognized for its potential to transform patient management. The primary aim of this research was to evaluate and quantify the performance of large language models (LLMs) in generating actionable travel medicine advice. Architectural design of the Travel Clinic LLM project. Four phases of Discovery, Design, Evaluation, and Implementation/Deployment. Methods This study utilized two iterative phases of evaluation. In the initial phase, LLMs were prompted with detailed clinical scenarios including demographic data, medical and immunization histories, and specific travel plans. These prompts were designed to mimic typical inquiries encountered in travel consultations. The LLMs' initial responses were generated using the CDC’s Yellow Book as a foundational knowledge base. In the subsequent phase, the prompts were refined for greater specificity and clarity, and the knowledge base was enhanced by transitioning to Travax’s Travelers’ Health database. Additional structured data inputs included an exhaustive list of vaccines from our pharmacy formulary and a detailed table of vaccine contraindications. The responses were evaluated and scored by ID clinicians from the Mayo Clinic. Results Initial findings after first iteration revealed limited efficacy with recall at 23.9%, an F1 score of 38.6%, accuracy also at 23.9%, and precision maintained at 100%, utilizing the CDC's Yellow Book. With the implementation of Travax and refined prompting techniques, preliminary results suggest a notable improvement in the quality of responses, though detailed scoring is presently underway. Improvements in the LLM’s performance can be attributed to several key adjustments: the adoption of a more comprehensive knowledge base, refined prompt engineering, and the incorporation of structured data to support more accurate and detailed recommendations. The collaborative engagement of Mayo Clinic with Google and Travax facilitated a synergistic approach to optimizing the AI model's utility and integration. Future plans include embedding the LLM into our EMR system. Conclusion The findings from this study highlight the significance of strategic collaborations between large healthcare centers, IT industry, and specialized knowledge database firms in effectively harnessing GAI for clinical use. Disclosures All Authors: No reported disclosures

  • Research Article
  • Cite Count Icon 11
  • 10.1109/tvcg.2024.3456350
DracoGPT: Extracting Visualization Design Preferences from Large Language Models.
  • Jan 1, 2025
  • IEEE transactions on visualization and computer graphics
  • Huichen Will Wang + 3 more

Trained on vast corpora, Large Language Models (LLMs) have the potential to encode visualization design knowledge and best practices. However, if they fail to do so, they might provide unreliable visualization recommendations. What visualization design preferences, then, have LLMs learned? We contribute DracoGPT, a method for extracting, modeling, and assessing visualization design preferences from LLMs. To assess varied tasks, we develop two pipelines-DracoGPT-Rank and DracoGPT-Recommend-to model LLMs prompted to either rank or recommend visual encoding specifications. We use Draco as a shared knowledge base in which to represent LLM design preferences and compare them to best practices from empirical research. We demonstrate that DracoGPT can accurately model the preferences expressed by LLMs, enabling analysis in terms of Draco design constraints. Across a suite of backing LLMs, we find that DracoGPT-Rank and DracoGPT-Recommend moderately agree with each other, but both substantially diverge from guidelines drawn from human subjects experiments. Future work can build on our approach to expand Draco's knowledge base to model a richer set of preferences and to provide a robust and cost-effective stand-in for LLMs.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 5
  • 10.5194/isprs-archives-xlviii-1-w2-2023-1729-2023
TREE-GPT: MODULAR LARGE LANGUAGE MODEL EXPERT SYSTEM FOR FOREST REMOTE SENSING IMAGE UNDERSTANDING AND INTERACTIVE ANALYSIS
  • Dec 14, 2023
  • The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences
  • S Q Du + 4 more

Abstract. This paper introduces a novel framework, Tree-GPT, which incorporates Large Language Models (LLMs) into the forestry remote sensing data workflow, thereby enhancing the efficiency of data analysis. Currently, LLMs are unable to extract or comprehend information from images and may generate inaccurate text due to a lack of domain knowledge, limiting their use in forestry data analysis. To address this issue, we propose a modular LLM expert system, Tree-GPT, that integrates image understanding modules, domain knowledge bases, and toolchains. This empowers LLMs with the ability to comprehend images, acquire accurate knowledge, generate code, and perform data analysis in a local environment. Specifically, the image understanding module extracts structured information from forest remote sensing images by utilizing automatic or interactive generation of prompts to guide the Segment Anything Model (SAM) in generating and selecting optimal tree segmentation results. The system then calculates tree structural parameters based on these results and stores them in a database. Upon receiving a specific natural language instruction, the LLM generates code based on a thought chain to accomplish the analysis task. The code is then executed by an LLM agent in a local environment and . For ecological parameter calculations, the system retrieves the corresponding knowledge from the knowledge base and inputs it into the LLM to guide the generation of accurate code. We tested this system on several tasks, including Search, Visualization, and Machine Learning Analysis. The prototype system performed well, demonstrating the potential for dynamic usage of LLMs in forestry research and environmental sciences.

  • Research Article
  • Cite Count Icon 3
  • 10.1080/00038628.2025.2488522
Integrating large language models, reinforcement learning, and machine learning for intelligent indoor thermal comfort regulation
  • Apr 9, 2025
  • Architectural Science Review
  • Deli Liu + 3 more

This paper explores a method for integrating large language models (LLMs), reinforcement learning, and machine learning models within multi-agent systems to regulate indoor thermal comfort. Utilizing natural language processing techniques, LLMs interpret user inputs, invoke pre-trained reinforcement learning models and machine learning models, predict current thermal comfort levels, and suggest appropriate actions. The study aims to enhance interaction between individuals and their indoor thermal environment. We selected a publicly available dataset as the foundation of our research. We trained a regression model and a reinforcement learning model using this dataset, integrating them into a multi-agent system's function library for intelligent management of indoor thermal comfort. Small-parameter LLMs were selected to build the natural language processing module and function calling module within the multi-agent system. When users input their current thermal feelings or environmental parameters in natural language, the LLMs can call the pre-trained models to provide suitable action suggestions. Abbreviations: AI: Artificial Intelligence; HVAC: Heating, Ventilation, and Air Conditioning; LLMs: Large Language Models; MAS: Multi-agent System; ML: Machine Learning; RL: Reinforcement Learning; PDD: Predicted Percentage of Dissatisfied; PMV: Predicted Mean Vote; PPO: Proximal Policy Optimization; TAV: Thermal Acceptability Vote; TCV: Thermal Comfort Vote; TSV: Thermal Sensation

  • Research Article
  • Cite Count Icon 6
  • 10.1108/jal-12-2024-0357
The impact of large language models on accounting and future application scenarios
  • May 9, 2025
  • Journal of Accounting Literature
  • Wenyi Li + 4 more

PurposeThis paper examines the transformative impact of large language models (LLMs) on accounting practices and explores future application scenarios. Through a systematic literature review, it highlights the potential of LLMs to enhance efficiency, transparency and innovation across areas such as financial reporting, ESG disclosure, financial analysis and risk management. Additionally, it identifies key challenges, including data quality, privacy and the need for domain-specific adaptations, while proposing actionable strategies to address them. By forecasting advanced applications like intelligent knowledge bases and automated operations, this study provides a roadmap for integrating LLMs into accounting, driving progress and sustainability in the industry.Design/methodology/approachThis study adopts a systematic literature review methodology to explore the impact and future applications of LLMs in accounting. It identifies key research areas by analyzing over 50 high-quality studies selected through extensive keyword searches, Boolean queries and backward and forward citation analyses of seminal works. The review is structured around eight thematic areas, including financial reporting, ESG disclosure and risk management. By synthesizing findings, the study develops a comprehensive framework for understanding the transformative potential of LLMs while addressing associated challenges, such as data security and specialization, to guide future research and practical applications in accounting.FindingsThe study reveals that LLMs significantly enhance efficiency, transparency and innovation in accounting by automating processes like financial reporting, ESG disclosure and risk management. They enable advanced applications such as intelligent knowledge bases, budget optimization and automated contract management. However, challenges remain, including the need for high-quality data, domain-specific model training, interdisciplinary talent development and robust data security measures. The findings underscore LLMs’ potential to transform accounting practices while emphasizing the importance of theoretical frameworks and strategic planning to address these challenges and fully realize their benefits in driving industry progress and sustainability.Practical implicationsThe study highlights practical pathways for integrating LLMs into accounting, emphasizing their potential to automate processes, enhance decision-making and improve operational efficiency. Organizations can leverage LLMs for tasks such as financial reporting, ESG analysis and risk management, reducing manual effort and increasing accuracy. Practical implications include the need for targeted training of LLMs in accounting-specific contexts, robust data governance to ensure quality and security and developing interdisciplinary skills among accounting professionals. By addressing these areas, organizations can harness LLMs to drive innovation, streamline operations and achieve sustainable growth in a rapidly evolving business environment.Originality/valueThis study provides a comprehensive and systematic analysis of the transformative impact of LLMs on accounting, addressing gaps in fragmented research and limited practical insights. It uniquely integrates theoretical perspectives with practical applications, offering a structured framework for understanding LLMs’ role across multiple accounting domains. By identifying key challenges and proposing actionable strategies, the paper delivers original value to both researchers and practitioners, fostering innovation and guiding the integration of LLMs into accounting practices. Its forward-looking approach offers a valuable resource for advancing knowledge and shaping the future of accounting in the digital age.

  • Research Article
  • Cite Count Icon 3
  • 10.1080/17452007.2025.2456768
Balancing performance and cost of LLMs in a multi-agent framework for BIM data retrieval
  • Jan 28, 2025
  • Architectural Engineering and Design Management
  • Deli Liu + 2 more

This study explores strategies for optimizing the use of large language models (LLMs) in Building Information Modeling (BIM) data retrieval. BIM data retrieval plays a crucial role in enhancing the efficiency and effectiveness of building management and construction processes. Utilizing LLMs can significantly improve data accessibility, reduce retrieval time, and support better decision-making. We propose a method to match queries of varying complexity with suitable LLMs within a multi-agent system (MAS) to balance accuracy and computational costs. We evaluated three commonly used LLMs (GPT-3.5 Turbo, GPT-4o, and GPT-4 Turbo) and found that GPT-4o strikes a good balance between performance and cost. By encoding and clustering query statements, we effectively classified query difficulty levels and matched them with appropriate models. Our tests showed that the multi-agent system with the planner mechanism reduced costs by nearly 31% while maintaining the same accuracy compared to systems without the mechanism.

Save Icon
Up Arrow
Open/Close
  • Ask R Discovery Star icon
  • Chat PDF Star icon

AI summaries and top papers from 250M+ research sources.