Small groups nurturing collective wisdom: The punishment-prediction reinforcement learning mechanism for multi-group cooperation
Small groups nurturing collective wisdom: The punishment-prediction reinforcement learning mechanism for multi-group cooperation
- New
- 10.1016/j.amc.2025.129533
- Nov 1, 2025
- Applied Mathematics and Computation
104
- 10.1038/ncomms4976
- May 29, 2014
- Nature Communications
1
- 10.1007/s11432-023-4287-2
- Apr 25, 2025
- Science China Information Sciences
- 10.1016/j.chaos.2025.115998
- Mar 1, 2025
- Chaos, Solitons & Fractals
109
- 10.1126/science.1198349
- Nov 11, 2010
- Science
81
- 10.1088/1367-2630/18/10/103007
- Oct 1, 2016
- New Journal of Physics
3
- 10.1016/j.physa.2024.129999
- Jul 26, 2024
- Physica A: Statistical Mechanics and its Applications
13
- 10.1016/j.physleta.2018.05.020
- May 17, 2018
- Physics Letters A
- 10.1016/j.chaos.2025.116496
- Aug 1, 2025
- Chaos, Solitons & Fractals
2053
- 10.1038/nature04605
- May 1, 2006
- Nature
- Research Article
10
- 10.1016/j.neucom.2012.04.012
- May 10, 2012
- Neurocomputing
Thalamic cooperation between the cerebellum and basal ganglia with a new tropism-based action-dependent heuristic dynamic programming method
- Supplementary Content
1
- 10.1155/2022/1327620
- Jul 1, 2022
- Computational Intelligence and Neuroscience
In order to realize the effective cooperation between editor agents in the film and television editing collaboration system, it is analyzed that the state change of the film and television editing and production process is affected by the cross influence of multiple factors. A single agent can no longer satisfy the current film and television production. From the point of view of system theory, this article constructs the learner agent in the film and television editing system by introducing a new cooperation mechanism—the multi-agent collaborative system model. Collaboration and cooperation between multiple agents and the reinforcement learning between multiple editor agents are realized based on the film and television editing system between multiple agents. The operation mechanism of the separate organization is organized together, cooperates with each other and works in harmony to complete the collaborative effect of the film and television editing system, and can improve the interaction efficiency between the editor agents. Agent film and television editing's cooperative learning approach allows for successful collaboration among editor agents. The Bayesian technique is utilized in this study to assess the likelihood of effective cooperation between two agents, and a trust model based on this method is presented, making up for the shortcomings of the existing collaborative learning system. The multi-agent collaboration system will be utilized for production in the film and television editing collaboration system. Many of the movie's scenes and segments are created using computer technology special effects, giving viewers a very unique experience and a feast for the eyes and ears.
- Research Article
12
- 10.3233/ifs-130945
- Jan 1, 2014
- Journal of Intelligent & Fuzzy Systems
Nash Q-learning and team Q-learning are extended versions of reinforcement learning method for using in Multi-agent systems as cooperation mechanisms. The complexity of multi-agent reinforcement learning systems is extremely high thus it is necessary to use complexity reduction methods like hierarchical structures, abstraction and task decomposition. A typical approach for the latter to define subtasks is based on extracting bottlenecks. In this paper, bottlenecks are automatically extracted to create temporally extended actions which are in turn added to available agent's actions in cooperation mechanisms of multi-agent systems. The updating equations of team Q-learning and Nash Q-learning are extended in such a way to involve temporally extended actions. In this way the performance of learning in team Q-learning and Nash Q-learning is considerably increased. The experimental results show an interesting improvement in the process of learning of cooperation mechanisms being augmented by extracted temporally actions in multi-agent problems.
- Video Transcripts
- 10.48448/zvx0-ef72
- Jul 4, 2021
This study examined the influence of metacognition on declarative and reinforcement learning (RL) mechanisms. We collected data from 218 undergraduates using a within-subjects metacognitive manipulation of a stimulus-response (S-R) learning task created by Collins (2018). Contributions of declarative and RL mechanisms are assessed by differences in learning rate for blocks of 3 items versus 6 items, and by the rate of forgetting with an incidental post-test. If metacognition differentially affects declarative and RL, we expect a three-way interaction between the task phase (learning/post-test), block type (long/short), and metacognition (before/during). Our results showed significant main effects of phase (F(1,217) =143.18, p=9.18e-32), length (F(1,217)=541.11, p=2.06e-104) and metacognition (F(1,217) = 19.78, p = 9.22e-06), with better performance during the learning phase, short blocks, and metacognitive manipulation. A significant phase by metacognition interaction (F(1,217) =8.11, 4.45e-03) suggested that metacognition monitoring improved test performance while having little effect on learning performance.
- Conference Article
16
- 10.1109/cec.2008.4630945
- Jun 1, 2008
This paper presents a population based Metaheuristic adopting the metaphor of social autonomous agents. In this context, agents cooperate and self-adapt in order to collectively solve a given optimization problem. From an evolutionary computation point of view, mechanisms driving the search consist of combining intensification operators and diversification operators, such as local search and mutation or recombination. The multiagent paradigm mainly focuses on the adaptive capabilities of individual agents evolving in a context of decentralized control and asynchronous communication. In the proposed metaheuristic, the agentpsilas behavior is guided by a decision process for the operatorspsilachoice which is dynamically adapted during the search using reinforcement learning and mimetism learning between agents. The approach is called Coalition-Based Metaheuristic (CBM) to refer to the strong autonomy conferred to the agents. This approach is applied to the Vehicle Routing Problem to emphasize the performance of learning and cooperation mechanisms.
- Conference Article
- 10.2514/6.2023-1070
- Jan 19, 2023
In this paper, we present a detailed framework for the verification and validation of learning-based reinforcement learning (RL) mechanisms in aerospace control software. First, we integrate an adversarial input mitigation and moving target defense framework and verify its efficacy in real-time. Then, we provide a testing framework to verify the robustness of closed-loop RL mechanisms. The reliability of the adversarially robust RL mechanism is tested using the VerifAI toolkit and an X-plane 11 Cessna 172.
- Research Article
1
- 10.1007/s10845-024-02494-0
- Oct 13, 2024
- Journal of Intelligent Manufacturing
Self-organizing manufacturing network has emerged as a viable solution for adaptive manufacturing control within the mass personalization paradigm. This approach involves three critical elements: system modeling and control architecture, interoperable communication, and adaptive manufacturing control. However, current research often separates interoperable communication from adaptive manufacturing control as isolated areas of study. To address this gap, this paper introduces Knowledge Graph-enhanced Multi-Agent Reinforcement Learning (MARL) method that integrates interoperable communication via Knowledge Graphs with adaptive manufacturing control through Reinforcement Learning. We hypothesize that implicit domain knowledge obtained from historical production job allocation records can guide each agent to learn more effective scheduling policies with accelerated learning rates. This is based on the premise that machine assignment preferences effectively could reduce the Reinforcement Learning search space. Specifically, we redesign machine agents with new observation, action, reward, and cooperation mechanisms considering the preference of machines, building upon our previous MARL base model. The scheduling policies are trained under extensive simulation experiments that consider manufacturing requirements. During the training process, our approach demonstrates improved training speed compared with individual Reinforcement Learning methods under the same training hyperparameters. The obtained scheduling policies generated by our Knowledge Graph-enhanced MARL also outperform both individual Reinforcement Learning methods and heuristic rules under dynamic manufacturing settings.
- Research Article
17
- 10.1109/tpwrs.2022.3223255
- Nov 1, 2023
- IEEE Transactions on Power Systems
With the massive penetration of renewable energy, traditional reinforcement learning algorithms suffer from slow convergence and area control error (ACE) in interconnected power systems. This paper proposes data-driven load frequency control (LFC) based on multi-agent reinforcement learning with attention mechanism in interconnected power systems. It can be divided into two phases; in the centralized training, the agents are trained by an experience replay mechanism; in the decentralized execution, the trained agent automatically regulates the generation power to control the load frequency by real-time access to the grid data in the area. The agent can selectively focus on specific information in the environment by introducing a criticism network with an attention mechanism. The attention mechanism can reduce the training time for reinforcement learning while improving control performance under disturbance. A novel reward function based on a cooperation mechanism is used to score the performance of agent, which can guide the reinforcement learning algorithm to reduce the ACE of each area simultaneously. The proposed method is validated by the IEEE three-area interconnected power system, and it is concluded that the method can reduce the ACE caused by load and renewable power disturbances, and greatly reduce the training time of the algorithm.
- Research Article
6
- 10.1088/2632-072x/ad3f65
- Apr 26, 2024
- Journal of Physics: Complexity
Decision-making often overlooks the feedback between agents and the environment. Reinforcement learning is widely employed through exploratory experimentation to address problems related to states, actions, rewards, decision-making in various contexts. This work considers a new perspective, where individuals continually update their policies based on interactions with the spatial environment, aiming to maximize cumulative rewards and learn the optimal strategy. Specifically, we utilize the Q-learning algorithm to study the emergence of cooperation in a spatial population playing the donation game. Each individual has a Q-table that guides their decision-making in the game. Interestingly, we find that cooperation emerges within this introspective learning framework, and a smaller learning rate and higher discount factor make cooperation more likely to occur. Through the analysis of Q-table evolution, we disclose the underlying mechanism for cooperation, which may provide some insights to the emergence of cooperation in the real-world systems.
- Research Article
- 10.3389/fncom.2022.818985
- Apr 8, 2022
- Frontiers in Computational Neuroscience
Lifetime learning, or the change (or acquisition) of behaviors during a lifetime, based on experience, is a hallmark of living organisms. Multiple mechanisms may be involved, but biological neural circuits have repeatedly demonstrated a vital role in the learning process. These neural circuits are recurrent, dynamic, and non-linear and models of neural circuits employed in neuroscience and neuroethology tend to involve, accordingly, continuous-time, non-linear, and recurrently interconnected components. Currently, the main approach for finding configurations of dynamical recurrent neural networks that demonstrate behaviors of interest is using stochastic search techniques, such as evolutionary algorithms. In an evolutionary algorithm, these dynamic recurrent neural networks are evolved to perform the behavior over multiple generations, through selection, inheritance, and mutation, across a population of solutions. Although, these systems can be evolved to exhibit lifetime learning behavior, there are no explicit rules built into these dynamic recurrent neural networks that facilitate learning during their lifetime (e.g., reward signals). In this work, we examine a biologically plausible lifetime learning mechanism for dynamical recurrent neural networks. We focus on a recently proposed reinforcement learning mechanism inspired by neuromodulatory reward signals and ongoing fluctuations in synaptic strengths. Specifically, we extend one of the best-studied and most-commonly used dynamic recurrent neural networks to incorporate the reinforcement learning mechanism. First, we demonstrate that this extended dynamical system (model and learning mechanism) can autonomously learn to perform a central pattern generation task. Second, we compare the robustness and efficiency of the reinforcement learning rules in relation to two baseline models, a random walk and a hill-climbing walk through parameter space. Third, we systematically study the effect of the different meta-parameters of the learning mechanism on the behavioral learning performance. Finally, we report on preliminary results exploring the generality and scalability of this learning mechanism for dynamical neural networks as well as directions for future work.
- Book Chapter
12
- 10.1007/978-3-319-30698-8_12
- Jan 1, 2016
Learning mechanisms in selection hyper-heuristics are used to identify the most appropriate subset of heuristics when solving a given problem. Several experimental studies have used additive reinforcement learning mechanisms, however, these are inconclusive with regard to the performance of selection hyper-heuristics with these learning mechanisms. This paper points out limitations to learning with additive reinforcement learning mechanisms. Our theoretical results show that if the probability of improving the candidate solution in each point of the search process is less than 1 / 2 which is a mild assumption, then additive reinforcement learning mechanisms perform asymptotically similar to the simple random mechanism which chooses heuristics uniformly at random. In addition, frequently used adaptation schemes can affect the memory of reinforcement learning mechanisms negatively. We also conducted experiments on two well-known combinatorial optimisation problems, bin-packing and flow-shop, and the obtained results confirm the theoretical findings. This study suggests that alternatives to the additive updates in reinforcement learning mechanisms should be considered.
- Book Chapter
4
- 10.1007/11893028_54
- Jan 1, 2006
Online reinforcement learning achieves learning after update estimation value for (state, action) pairs selecting in present state before do state transition by next state. Therefore, online reinforcement learning needs polynomial search time to find most optimal value-function. But, a lots of reinforcement learning that are proposed for online reinforcement learning update estimation value for (state, action) pairs that agents select in present state, and because estimation value for unselected (state, action) pairs is evaluated in other episodes, perfect online reinforcement learning is not. Therefore, in this paper, we propose online ant reinforcement learning method using Ant-Q and eligibility trace to solve this problem. The eligibility trace is one of the basic mechanisms in reinforcement learning to handle delayed reward. The traces are said to indicate the degree to which each state is eligible for undergoing learning changes should a reinforcing event occur. Formally, there are two kinds of eligibility traces(accumulating trace or replacing traces). In this paper, we propose online ant reinforcement learning algorithms using an eligibility traces which is called replace-trace methods. This method is a hybrid of Ant-Q and eligibility traces. Although replacing traces are only slightly different from accumulating traces, it can produce a significant improvement in optimization. We could know through an experiment that proposed reinforcement learning method converges faster to optimal solution than Ant Colony System and Ant-Q.
- Research Article
210
- 10.1093/cercor/bhr117
- Jun 21, 2011
- Cerebral Cortex
The frontal lobes may be organized hierarchically such that more rostral frontal regions modulate cognitive control operations in caudal regions. In our companion paper (Frank MJ, Badre D. 2011. Mechanisms of hierarchical reinforcement learning in corticostriatal circuits I: computational analysis. 22:509-526), we provide novel neural circuit and algorithmic models of hierarchical cognitive control in cortico-striatal circuits. Here, we test key model predictions using functional magnetic resonance imaging (fMRI). Our neural circuit model proposes that contextual representations in rostral frontal cortex influence the striatal gating of contextual representations in caudal frontal cortex. Reinforcement learning operates at each level, such that the system adaptively learns to gate higher order contextual information into rostral regions. Our algorithmic Bayesian "mixture of experts" model captures the key computations of this neural model and provides trial-by-trial estimates of the learner's latent hypothesis states. In the present paper, we used these quantitative estimates to reanalyze fMRI data from a hierarchical reinforcement learning task reported in Badre D, Kayser AS, D'Esposito M. 2010. Frontal cortex and the discovery of abstract action rules. Neuron. 66:315--326. Results validate key predictions of the models and provide evidence for an individual cortico-striatal circuit for reinforcement learning of hierarchical structure at a specific level of policy abstraction. These findings are initially consistent with the proposal that hierarchical control in frontal cortex may emerge from interactions among nested cortico-striatal circuits at different levels of abstraction.
- Research Article
44
- 10.1016/j.jmsy.2023.03.003
- Apr 9, 2023
- Journal of Manufacturing Systems
Dynamic production scheduling towards self-organizing mass personalization: A multi-agent dueling deep reinforcement learning approach
- Research Article
10
- 10.1016/j.energy.2024.130434
- Jan 22, 2024
- Energy
Intelligent optimization method for real-time decision-making in laminated cooling configurations through reinforcement learning
- Research Article
- 10.1016/j.chaos.2025.116975
- Nov 1, 2025
- Chaos, Solitons & Fractals
- Research Article
- 10.1016/j.chaos.2025.116923
- Nov 1, 2025
- Chaos, Solitons & Fractals
- Research Article
- 10.1016/j.chaos.2025.117014
- Nov 1, 2025
- Chaos, Solitons & Fractals
- Research Article
- 10.1016/j.chaos.2025.116934
- Nov 1, 2025
- Chaos, Solitons & Fractals
- Research Article
- 10.1016/j.chaos.2025.117160
- Nov 1, 2025
- Chaos, Solitons & Fractals
- Research Article
- 10.1016/j.chaos.2025.117023
- Nov 1, 2025
- Chaos, Solitons & Fractals
- Research Article
- 10.1016/j.chaos.2025.117020
- Nov 1, 2025
- Chaos, Solitons & Fractals
- Research Article
- 10.1016/j.chaos.2025.116967
- Nov 1, 2025
- Chaos, Solitons & Fractals
- Research Article
- 10.1016/j.chaos.2025.116970
- Nov 1, 2025
- Chaos, Solitons & Fractals
- Research Article
- 10.1016/j.chaos.2025.117134
- Nov 1, 2025
- Chaos, Solitons & Fractals
- Ask R Discovery
- Chat PDF
AI summaries and top papers from 250M+ research sources.