Articles published on Task Decomposition
Authors
Select Authors
Journals
Select Journals
Duration
Select Duration
838 Search results
Sort by Recency
- New
- Research Article
- 10.55041/ijsrem61256
- Apr 27, 2026
- INTERNATIONAL JOURNAL OF SCIENTIFIC RESEARCH IN ENGINEERING AND MANAGEMENT
- Krishnaveni V V + 5 more
Abstract—Managing a small business usually means juggling many responsibilities at the same time, such as keeping track of sales, checking inventory, and staying connected with team members. When these tasks are handled manually or with limited tools, it can slow things down, increase mistakes, and make it harder to see how the business is actually doing. This project introduces a web-based system that brings everything together in one place. It includes an online sales platform and an admin dashboard supported by an AI assistant. The system collects everyday business data, like sales and stock levels, and turns it into simple insights that can guide better decisions. It also makes communication easier by helping business owners draft messages for their coworkers, saving both time and effort. By combining smart automation, data insights, and AI support, the system aims to make running a business more organized, efficient, and easier to grow. Index Terms— Agentic AI, Generative AI, Intelligent Assis-tants, Task Decomposition, Tool Integration, Business Innovation, Entrepreneurship.
- New
- Research Article
- 10.3390/app16083851
- Apr 15, 2026
- Applied Sciences
- Beril Yalcinkaya + 3 more
Efficient task allocation and coordination are critical for heterogeneous multi-agent systems operating in dynamic field environments. This paper presents a closed-loop framework that integrates Large Language Models (LLMs) with graph-based optimisation to enable end-to-end task decomposition, allocation, and adaptive execution. High-level task scripts are initially parsed by an LLM into structured execution flows, which are transformed into Directed Acyclic Graphs (DAGs) capturing action-level dependencies. A Genetic Algorithm (GA) then optimises agent-to-task assignments by minimising makespan under capability and battery constraints. To ensure robustness, the framework incorporates an LLM-driven recovery module that enables localised replanning under execution failures without interrupting unaffected agents. System-level experiments in a high-fidelity agroforestry simulation demonstrate a 37% increase (p<0.001) in harvesting productivity and a 19% reduction in human idle time compared to manual baselines. Under mid-execution failures, the system maintains significantly higher performance, with replanning latencies averaging 24 s. The framework scales to large fleets (up to 1000 agents) and effectively enhances human–robot collaboration through structured, dependency-aware coordination.
- Research Article
1
- 10.1016/j.watres.2026.125433
- Apr 1, 2026
- Water research
- Jian Wang + 2 more
Water distribution networks (WDNs), a critical part of urban infrastructure, normally require numerous model simulations for effective planning and management. However, traditional WDN modelling requires complex workflows and specialized expertise. EPANET is the most widely adopted modelling tool for WDN hydraulics and water quality simulations, yet its operational complexity restricts accessibility and slows timely decision-making. Recent advances in large language models (LLMs) have led to the development of agentic artificial intelligence systems that autonomously coordinate tasks and control complex engineering simulations through natural language prompts. Here we introduce EPANET-Agentic, a multi-agent system that integrates advanced workflow reasoning with the EPANET simulator and incorporates human-in-the-loop oversight for critical interventions. The new platform adopts an orchestrator-centred, tool-driven architecture that nests three specialised agents (TaskExecutor, CodeRunner, and DataAnalyzer) as function-call tools. This design enables autonomous task decomposition, precise tool invocation, and transparent workflow management. The abilities of EPANET-Agentic are evaluated on three benchmark networks (i.e., L-Town, C-Town, and Net3) across four categories of tasks: System Characteristics, System Dynamics, System Operation, and Scenario Simulation. The results demonstrate that EPANET-Agentic achieved a 100% success rate and tool invocation accuracy with no human interventions. Moreover, the multimodal DataAnalyzer agent provided valid interpretations of simulation results, while the nested tool design ensured robustness and the architecture exhibited strong scalability across diverse hydraulic analysis tasks. These findings confirm that EPANET-Agentic enables natural language-controlled WDN simulation and analysis with engineering-grade reliability, while still adhering to a human-in-the-loop approach required for safety-critical systems. With its modular architecture and strong adaptability, EPANET-Agentic marks a step change from conventional WDN modelling approaches, positioning itself as a next-generation platform for complex planning and management challenges.
- Research Article
- 10.71465/fair747
- Mar 29, 2026
- Frontiers in Artificial Intelligence Research
- Jiawei Li + 2 more
Agentic AI has emerged as a promising paradigm for autonomous reasoning and execution in complex AI-driven applications; however, its effective deployment in cloud-native environments remains challenging due to the lack of unified platform architectures that jointly support task decomposition, multi-agent collaboration, and adaptive cloud resource orchestration. In practical scenarios such as automated data analytics, AI DevOps, and MLOps pipelines, Agentic AI systems must operate over dynamic containerized infrastructures where resource availability, execution cost, and failure conditions continuously change. Existing approaches typically decouple agent-level decision making from cloud-native scheduling, resulting in limited scalability and poor robustness. To address these limitations, this paper proposes CANAO, a Cloud-Aware Native Agentic AI framework for adaptive task orchestration in cloud-native environments. CANAO models complex AI workloads as dynamically reconfigurable task dependency graphs and enables coordinated collaboration among Planner, Executor, and Critic agents. By incorporating real-time cloud resource awareness into the agent orchestration loop, CANAO supports adaptive scheduling, partial task re-planning, and self-healing execution on Kubernetes-based platforms. A prototype system is implemented using cloud-native technologies and evaluated on representative automated data analysis and AI DevOps workflows. Experimental results show that CANAO significantly outperforms baseline orchestration methods under dynamic cloud conditions. Compared with static DAG-based scheduling, CANAO reduces end-to-end task execution time by approximately 34.3% and cloud resource cost by nearly 30%, while lowering the task failure rate by over 34%. These improvements demonstrate the effectiveness of cloud-aware agent collaboration and adaptive task orchestration in large-scale cloud-native AI workflows.
- Research Article
- 10.3390/s26061958
- Mar 20, 2026
- Sensors (Basel, Switzerland)
- Xiaoyun Liang + 1 more
Widespread acceptance of collaborative robots in human-involved scenarios requires accessible and intuitive interfaces for lay workers and non-expert users. Existing interfaces often rely on users to plan and issue low-level commands, necessitating extensive knowledge of robot control. This study proposes a multimodal agentic AI framework integrating natural user interfaces (NUIs) to foster effortless human-like partnerships in human-robot collaboration (HRC), which enhance intuitiveness and operational efficiency. First, it allows users to instruct robots using plain language verbally, coupled with gaze, revealing objects precisely. Second, it offloads users' workload for robot motion planning by understanding context and reasoning task decomposition. Third, coordinating with AI agents built on large language models (LLMs), the system interprets users' requests effectively and provides feedback to establish transparent communication. This proof-of-concept study included experiments to demonstrate a practical implementation of the agentic AI framework on a mobile manipulation robot in the collaborative task of human-robot wood assembly. Seven participants were recruited to interact with this AI-integrated agentic robotic system. Task performance and user experience metrics were measured in terms of completion time, intervention rate, NASA TLX survey for workload, and valuable insights of practical applications were summarized through a qualitative analysis. This study highlights the potential of NUIs and agentic AI-embodied robots to overcome existing HRC barriers and contributes to improving HRC intuitiveness and efficiency.
- Research Article
- 10.1142/s2301385028500057
- Mar 18, 2026
- Unmanned Systems
- Kun Yang + 3 more
Multi-agent pursuit-evasion has significant applications in military, transportation, and industrial sectors. This task faces dual challenges of uncertain swarm scale and environmental uncertainty within unstructured dynamic environments. To address these, we propose a hierarchical reinforcement learning framework based on permutation invariance to balance pursuit efficiency and collision avoidance safety. First, in the evaluation stage, we introduce a hybrid feature aggregation mechanism based on the DeepSets structure and a predicted intercept point auxiliary task. By extracting permutation-invariant group features and introducing kinematic priors, this approach achieves zero-shot transfer and rapid convergence of the method across different swarm scales. Second, in the execution stage, we construct a residual gating architecture based on task decomposition. This architecture utilizes a frozen basic tracking stream to handle the global game and employs a conditioned residual stream with a dynamic gating mechanism to manage local obstacle avoidance, effectively resolving the multi-objective conflict problem under sparse rewards. Finally, extensive simulations and physical experiments demonstrate that the proposed method, after training on small-scale swarms, achieves robust zero-shot transfer to medium and large-scale swarms, completing efficient and stable pursuit tasks. Furthermore, the method was successfully transferred to a physical platform, completing tasks smoothly even in the presence of faulty agents, thereby validating its applicability in the real world.
- Research Article
- 10.1080/09544828.2026.2639928
- Mar 6, 2026
- Journal of Engineering Design
- Meno-Said Haddad + 1 more
Generative intelligent design systems typically remain static: they generate plausible artifacts but do not adapt through use or internalise expert design knowledge in a principled way. To address this limitation, we propose reinforcement learning from verifiable feedback as a framework for codified human–AI collaboration, in which domain expertise is formalised as algorithmic verifiers that evaluate generated designs against explicit structural and logical rules and convert rule compliance into reinforcement signals. We instantiate RLVF on the engineering task of functional decomposition, where designs are represented as typed function–flow graphs governed by verifiable structural constraints. Using group relative policy optimisation, a Llama-3.1-8B model is trained exclusively from verifier feedback without labelled output data. Across reinforcement learning cycles, correct output formatting improves from 18% to 100%, fully connected functional structures improve from 10% to 100%, and error-free decompositions improve from 4% to 100%, exceeding a supervised fine-tuned baseline trained on 257 labelled examples. Human evaluation shows that RLVF-trained models achieve higher perceived logical coherence than supervised models, while exhibiting reduced structural diversity. These results demonstrate that verifiable feedback can replace human annotation in rule-governed design tasks and enable correctness-grounded generative design systems based on codified expert knowledge.
- Research Article
- 10.1186/s40359-026-04204-2
- Mar 5, 2026
- BMC psychology
- Hanhui Li + 2 more
This study advances a task-focused perspective on AI dependency by integrating Cognitive Load Theory and Self-Determination Theory and by specifying cognitive load, future anxiety, and task motivation as correlates linking task complexity with AI dependency. Practically, the results suggest actionable directions for course and assessment design (e.g., calibrating task complexity, scaffolding task decomposition and staged feedback, and supporting motivation and emotion management) to help students balance AI use with autonomous learning.
- Research Article
- 10.1016/j.rineng.2026.109726
- Mar 1, 2026
- Results in Engineering
- Bingyu Cao + 3 more
FEMT-YOLO: Frequency-enhanced multi-scale network for small object detection in aerial images
- Research Article
- 10.1109/tpami.2025.3626772
- Mar 1, 2026
- IEEE transactions on pattern analysis and machine intelligence
- Zanlin Ni + 7 more
Recent advances in image synthesis have been propelled by powerful generative models, such as Masked Generative Transformers (MaskGIT), autoregressive models, diffusion models, and rectified flow models. A common principle behind their success is the decomposition of complex synthesis tasks into multiple tractable steps. However, this introduces a proliferation of step-specific parameters to be configured for modulating the iterative generation process (e.g., mask ratio, noise level, or temperature at each step). Existing approaches typically rely on manually-designed scheduling rules to manage this complexity, demanding expert knowledge and extensive trial-and-error. Furthermore, these static schedules lack the flexibility to adapt to the unique characteristics of each individual sample, yielding sub-optimal performance. To address this issue, we present AdaGen, a general, learnable, and sample-adaptive framework for scheduling the iterative generation process. Specifically, we formulate the scheduling problem as a Markov Decision Process, where a lightweight policy network is introduced to adaptively determine the most suitable parameters given the current generation state, and can be trained through reinforcement learning. Importantly, we demonstrate that simple reward designs, such as FID or pre-trained reward models, can be easily hacked and may not reliably guarantee the desired quality or diversity of generated samples. Therefore, we propose an adversarial reward design to guide the training of the policy networks effectively. Finally, we introduce an inference-time refinement strategy and a controllable fidelity-diversity trade-off mechanism to further enhance the performance and flexibility of AdaGen. Comprehensive experiments across five benchmark datasets (ImageNet-256 × 256 & 512 × 512, MS-COCO, CC3M, and LAION-5B) and four distinct generative paradigms validate the superiority of AdaGen . For example, AdaGen achieves better performance on DiT-XL with $\mathbf {\sim 3\times }$∼3× lower inference cost and improves the FID of VAR from 1.92 to 1.59 with negligible additional computational overhead.
- Research Article
- 10.1109/tfuzz.2025.3647956
- Mar 1, 2026
- IEEE Transactions on Fuzzy Systems
- Zefan Zeng + 5 more
Event causality identification (ECI) aims to detect causal relationships between events in textual contexts. Existing ECI models predominantly rely on supervised methodologies, suffering from dependence on large-scale annotated data. Although large language models (LLMs) enable zero-shot ECI, they are prone to causal hallucination—erroneously establishing spurious causal links. To address these challenges, we propose MEFA, a novel zero-shot ECI model based on multisource evidence fuzzy aggregation. First, we decompose causality reasoning into three main tasks (temporality determination, necessity analysis, and sufficiency verification) complemented by three auxiliary tasks. Second, leveraging meticulously designed prompts, we guide LLMs to generate uncertain responses and deterministic outputs. Finally, we quantify LLM's responses of subtasks and employ fuzzy aggregation to integrate these evidence for causality scoring and causality determination. Extensive experiments on three benchmarks demonstrate that MEFA outperforms second-best unsupervised baselines by 6.2% in <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"><tex-math notation="LaTeX">$F1$</tex-math></inline-formula>-score and 9.3% in precision, while significantly reducing hallucination-induced errors. In-depth analysis verify the effectiveness of task decomposition and the superiority of fuzzy aggregation.
- Research Article
- 10.1007/s11701-026-03247-2
- Feb 24, 2026
- Journal of robotic surgery
- Francesco Brucchi + 4 more
The adoption of robotic platforms in bariatric and metabolic surgery has increased steadily, raising important questions regarding how surgeons are trained to safely acquire robotic skills. While structured and competency-based training models are increasingly adopted in other fields of robotic surgery, training approaches in robotic bariatric surgery remain less standardized.A systematic review was conducted in accordance with PRISMA 2020 guidelines to identify studies describing structured training pathways or formal curricula for robotic bariatric surgery. PubMed, Embase, Scopus, and Cochrane Library, were searched from inception without date restrictions. Eligible studies explicitly reported training programs, curricula, or educational pathways for robotic bariatric procedures. Learning curve analyses without a defined curriculum were excluded. Data were synthesized using a structured narrative approach.Five studies met the inclusion criteria. Training models included stepwise intraoperative curricula, simulation-based and proficiency-driven programs, and modular educational interventions. Common components across curricula were simulation training, task decomposition, supervised progression, and defined competency benchmarks. Assessment strategies were heterogeneous and ranged from simulation-based proficiency thresholds to operative participation metrics and subjective workload measures. No study reported standardized certification or long-term competency outcomes. Structured training pathways for robotic bariatric surgery have been described and incorporate elements aimed at supporting safe skill acquisition. However, existing curricula remain heterogeneous and lack standardized assessment frameworks. Future efforts should focus on developing competency-driven and proficiency-based progression training models to support reproducible and safe adoption of robotic bariatric surgery.
- Research Article
- 10.1097/dm-2025-00015
- Feb 24, 2026
- Digital Medicine
- Hao Kechun + 1 more
With the rapid development of large language models (LLMs) in natural language processing and generation tasks, researchers are continuously validating their application potential in the field of medical artificial intelligence (AI). Generative Pre-trained Transformer 5 (GPT-5), Pathways Language Model 2 (PaLM-2), Large Language Model Meta AI 3 (Llama-3), Gemini-2.5, and various medical vertical models have demonstrated excellent performance in many medical and health-related tasks. However, LLMs still face key bottlenecks in real-world medical scenarios, including “illusions” slow knowledge updates, and a lack of interpretability. These problems severely restrict the safe deployment of large language models in high-risk medical scenarios and have become a major obstacle to the transition of medical artificial intelligence from research to clinical application. Retrieval-augmented generation (RAG) technology has therefore become an important solution for improving the credibility of medical LLMs. RAG significantly reduces the risk of information errors by retrieving authoritative medical knowledge before generation while maintaining knowledge updates without retraining. A review of research progress on RAG in the medical field is of great theoretical value and practical significance for promoting the design, evaluation, and standardized application of trustworthy medical AI. This review aims to systematically summarize the research progress of RAG in medical scenarios, including the technical frameworks, typical applications, and development trends of three types of methods—naive RAG, advanced RAG, and modular RAG—and further discuss their significance in medical reliability, health equity, and personalized medicine. The naive RAG architecture, through a standard “index-retrieval-generation” process, transforms external knowledge bases into a structured vector space that can be queried by an LLM, achieving positive results in electronic medical record (EMR) summary, preoperative assessment, and medical question answering (QA). However, naive RAG still has limitations in retrieval accuracy and cross-modal processing. To address these shortcomings, advanced RAG methods have been devised that significantly improve the model’s decision-making capabilities by improving retrieval strategies, enhancing inference chains, and introducing self-reflection mechanisms. Furthermore, modular RAG design offers composable, multi-module systems to support multi-source knowledge integration and complex task decomposition. In terms of application value, RAG’s contributions to the healthcare field are primarily reflected in these aspects. On the one hand, it significantly improves the reliability of medical AI by introducing traceable knowledge to reduce illusions. On the other hand, it helps promote healthcare equity by driving a shift toward a patient-centered healthcare model through localized knowledge bases and multilingual support. However, the deployment of RAG in medical scenarios still faces many challenges, such as data privacy risks. Future research should focus on improving self-supervised reflection capabilities and developing cross-modal knowledge fusion technologies. As research develops, RAG is expected to become a core foundational capability of future intelligent healthcare systems, further promoting the development of safe and efficient intelligent healthcare.
- Research Article
- 10.1093/schbul/sbag003.125
- Feb 13, 2026
- Schizophrenia Bulletin
- Lingjuan Li + 3 more
Abstract Background Under the backdrop of heavy academic workloads and intense competition in universities, academic stress-related depression is on the rise. Traditional mental health courses often remain at the level of knowledge dissemination, lacking targeted interventions and quantitative assessments for high-risk students. University education management departments need intervention effectiveness evidence that can be directly used for course optimization and student management. Therefore, this study evaluated the intervention effect of mental health education on academic stress-related depression from the perspective of university education management, aiming to provide new insights into reducing students’ levels of academic stress-related depression. Methods The study selected 412 students at risk of academic stress-related depression from three universities. All students had a Patient Health Questionnaire-9 (PHQ-9) score ≥ 10 and a learning stress scale score ≥ 60. They were randomly assigned by department and class: ① Conventional mental health course group (n = 206); ② Structured education management intervention + mental health course group (n = 206). The intervention lasted for 6 weeks and was jointly designed by the university’s education management department and the mental health center. It included four structured modules: ① Stress quantification and retrospection: constructing a personal stress curve based on weekly learning workload and subjective stress levels; ② Academic time management reconstruction: through course task decomposition, weekly goal tiering, and learning pace reallocation; ③ Peer support group: weekly emotion-task debriefing conducted in groups of 6-8 people guided by trained mentors; ④ Academic planning guidance: providing individualized academic path optimization and phased progress monitoring based on students’ course performance and course selection structure. The main indicators included the PHQ-9, the Generalized Anxiety Disorder-7 Questionnaire (GAD-7), the learning stress scale, learning engagement, and course completion rate. Repeated measures ANOVA and a structural model of the effects of educational management were used for analysis. Results After 6 weeks of intervention, the PHQ-9 score in the structured management intervention group decreased from 13.42 ± 2.91 to 7.58 ± 2.63, a decrease of 43.53%, significantly better than the control group (p&lt;.001). The learning stress scale score decreased by -6.21 ± 1.84 in the management intervention group, more than twice that of the control group. The learning engagement in the structured management intervention group increased by 26.42% compared to before the intervention, while the control group only increased by 9.73%, a highly significant difference (p&lt;.01). The course completion rate in the structured management intervention group increased by 14.24% compared to before the intervention, while the control group only increased by 4.15% after the intervention. Structural equation modeling shows that the indirect effect of improved academic time management on depression accounts for 37.85% of the total effect. Discussion Collaborative intervention by higher education management and mental health education can significantly reduce the level of academic stress-related depression. Regular courses can only produce limited emotional relief, while management intervention can significantly improve learning pace and perceived sense of control. Future research will further develop a semester-level dynamic intervention model and incorporate learning behavior log data to improve accuracy.
- Research Article
- 10.47392/irjaeh.2026.0046
- Jan 27, 2026
- International Research Journal on Advanced Engineering Hub (IRJAEH)
- Abdul Mateen + 4 more
This project presents the design and implementation of an intelligent AI agent capable of performing automated actions within a web browser environment. The proposed system integrates natural-language understanding, task decomposition, and browser-level automation to execute user- defined goals such as data extraction, form submission, website navigation, report generation, and repetitive workflow operations. The agent combines machine learning models with rule-based logic to accurately interpret user instructions, convert them into executable steps, and interact with web elements in real time. To ensure robustness, a lightweight automation framework is incorporated to manage element detection, handle dynamic page layouts, and recover from unexpected interface changes or errors. The system is further enhanced with decision-making capabilities that allow the agent to adapt its actions based on webpage behavior, user constraints, and context awareness. Experimental evaluation demonstrates that the AI agent significantly reduces manual effort, improves operational accuracy, and accelerates digital processes when compared to conventional browser automation tools or static scripts. Overall, this work highlights the growing potential of AI- driven autonomous agents in modern web environments and establishes a practical foundation for future advancements in self-guided, multi-step browser task execution across various domains.
- Research Article
- 10.1038/s41598-026-37025-9
- Jan 23, 2026
- Scientific reports
- Xingyi Liu + 5 more
With global aging, assessing functional status is vital for precision medicine. Electronic Health Records (EHRs), particularly unstructured data, hold abundant information on patient mobility. This study explores using Large Language Models (LLMs) to extract and standardize mobility status from unstructured EHR data (i.e., clinical notes). We annotated 600 clinical notes from three healthcare institutions located in southeastern Minnesota and west-central Wisconsin, focusing on expressions of mobility and associated impairment. Leveraging the open-source Llama 3 model, we tested various prompting strategies, including zero-shot, few-shot, and task decomposition, and evaluated their performance. Error analysis showed that while the model sometimes inferred impairments without explicit evidence, most errors were clinically reasonable, often reflecting borderline or ambiguous cases. Our final model achieved a patient-level micro-average F1-score of 0.876 [95% CI 0.858-0.894] for Mobility Extraction and 0.897 [95% CI 0.878-0.917] for Impairment Classification. A secondary analysis counting "clinically reasonable inferences" as correct, performed to assess clinical plausibility, yielded F1-scores of 0.962 [95% CI 0.952-0.971] and 0.948 [95% CI 0.936-0.960], respectively. A local, deterministic setup improved trustworthiness by ensuring consistent outputs, safeguarding privacy, and demonstrating cross-institution generalizability. These findings highlight the feasibility of LLM-based solutions for extracting mobility functional status from unstructured EHR data, supporting both clinical applications and research.
- Research Article
- 10.4018/ijkm.399498
- Jan 22, 2026
- International Journal of Knowledge Management
- Li Chen
Under the national strategy to strengthen culture, museum volunteers—key providers of education and interpretation—face management challenges such as a 45% annual turnover and weak cross-department collaboration. This study builds a synergistic model of “role positioning–collaboration mechanism–cultural identity” and evaluates it via mixed methods: longitudinal quantitative tracking of 1,284 volunteers in six first-class museums over two years (service logs, social-network, and identity metrics) plus 42 in-depth interviews. The role dimension uses dynamic allocation to improve task fit; the collaboration dimension applies digital platforms and WBS task decomposition; the cultural-identity dimension uses participatory rituals to deepen intrinsic motivation. Results show job–role matching reached 87.3%, network density rose 104.8%, cultural identity increased annual service time by 43%, and team–culture synergy raised innovative behavior to 73.5%. Findings support a “system–culture” dual-drive model for sustainable volunteer management.
- Research Article
- 10.63313/jcsft.9037
- Jan 16, 2026
- Journal of Computer Science and Frontier Technologies
- Shuai Dong
Accurate apple detection in complex orchards remains challenging due to foliage occlusion, illumination variations, and cluttered backgrounds. This study proposes an enhanced YOLOv11n framework integrating three architectural innovations. First, the EMCSP (EMA-enhanced Cross-Stage Partial) module is introduced into the backbone, synergistically incorporating multi-scale attention within cross-stage partial topology to strengthen discriminative feature extraction. Second, the ELA-HSFPN (Efficient Local Attention enhanced Hierarchical Scale Feature Pyramid Network) is devised for the neck, leveraging decoupled spatial attention and bidirectional hierarchical fusion to enhance multi-scale representation. Third, the TADDH (Task-Aligned Dynamic Detection Head) supersedes the conventional head, employing task decomposition, dynamic deformable convolution, and probabilistic feature modulation to achieve optimal classification-localization alignment. Extensive experiments demonstrate substantial improvements over baseline YOLOv11n: Precision+1.4%, Recall+2.3%, mAP@0.5+3.0%, and mAP@0.5:0.95 +1.7%. These results validate the efficacy of our methodology for intelligent fruit harvesting applications.
- Research Article
- 10.3390/electronics15020353
- Jan 13, 2026
- Electronics
- Bao Rong Chang + 2 more
Traditional automated monitoring systems adopted for Intersection Traffic Control still face challenges, including high costs, maintenance difficulties, insufficient coverage, poor multimodal data integration, and limited traffic information analysis. To address these issues, the study proposes a sovereign AI-driven Smart Transportation governance approach, developing a mobile AI solution equipped with multimodal perception, task decomposition, memory, reasoning, and multi-agent collaboration capabilities. The proposed system integrates computer vision, multi-object tracking, natural language processing, Retrieval-Augmented Generation (RAG), and Large Language Models (LLMs) to construct a Pipeline-based Traffic Analysis System (PTAS). The PTAS can produce real-time statistics on pedestrian and vehicle flows at intersections, incorporating potential risk factors such as traffic accidents, construction activities, and weather conditions for multimodal data fusion analysis, thereby providing forward-looking traffic insights. Experimental results demonstrate that the enhanced DuCRG-YOLOv11n pre-trained model, equipped with our proposed new activation function βsilu, can accurately identify various vehicle types in object detection, achieving a frame rate of 68.25 FPS and a precision of 91.4%. Combined with ByteTrack, it can track over 90% of vehicles in medium- to low-density traffic scenarios, obtaining a 0.719 in MOTA and a 0.08735 in MOTP. In traffic flow analysis, the RAG of Vertex AI, combined with Claude Sonnet 4 LLMs, provides a more comprehensive view, precisely interpreting the causes of peak-hour congestion and effectively compensating for missing data through contextual explanations. The proposed method can enhance the efficiency of urban traffic regulation and optimizes decision support in intelligent transportation systems.
- Research Article
- 10.47852/bonviewaia62026154
- Jan 12, 2026
- Artificial Intelligence and Applications
- Dhaya Ramakrishnan + 1 more
The intersection of artificial intelligence and neuroscience has resulted in the development of brain-inspired computational frameworks that simulate the human brain’s hierarchical decision-making and learning. In this work, we propose a Hierarchical Brain-Inspired Reinforcement Learning (HBRL) architecture that combines the benefits of Deep Reinforcement Learning (DRL) with a biologically inspired cognitive hierarchy. The proposed architecture functions by simulating cortical–subcortical processing of information in which a high-level Policy-Gradient manager conducts abstract and long-term planning, and the low-level Deep Q-Network (DQN) agents complete real-time short-term actions. The proposed architecture’s multilayer structure includes temporal abstraction, modular learning, and the ability to refine policies to optimize experience, which makes it appropriate for dynamic and uncertain environments. We applied HBRL in three common scenarios: GridWorld, autonomous vehicle navigation, and smart-city infrastructure control to evaluate the proposed system design. Overall, we found that HBRL had a 15%–20% higher rate of completing tasks, 1.4–2.4 times faster learning efficiency, along with 70–100 points higher cumulative reward when high-level and low-level HBRL agents were compared to baseline approaches (e.g., DQN, Proximal Policy Optimization, and Soft Actor-Critic). A statistical analysis using two-tailed t-tests also assessed the significance of improvements (p < 0.01) among all tested environments. The hierarchical decomposition of tasks serves both to promote convergence and improve agents’ generalization capacity in unseen conditions. In its entirety, the proposed HBRL framework provides a scalable and cognitive-inspired learning paradigm for developing intelligent autonomous systems that exhibit human-likeadaptability and efficient decision-making capabilities in complex, nonstationary real-world environments. Received: 14 May 2025 | Revised: 16 October 2025 | Accepted: 19 December 2025 Conflicts of Interest The authors declare that they have no conflicts of interest to this work. Data Availability Statement Data sharing is not applicable to this article as no new data were created or analyzed in this study. Author Contribution Statement Kanthavel Radhakrishnan: Conceptualization, Methodology, Formal analysis, Resources, Data curation, Writing – original draft, Writing – review & editing, Visualization. Dhaya Ramakrishnan: Software, Validation, Investigation, Writing – original draft, Writing – review & editing, Supervision, Project administration.