Small groups nurturing collective wisdom: The punishment-prediction reinforcement learning mechanism for multi-group cooperation

  • Abstract
  • References
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon
Take notes icon Take Notes

Small groups nurturing collective wisdom: The punishment-prediction reinforcement learning mechanism for multi-group cooperation

ReferencesShowing 10 of 58 papers
  • New
  • 10.1016/j.amc.2025.129533
Spatial public goods games with queueing and reputation
  • Nov 1, 2025
  • Applied Mathematics and Computation
  • Gui Zhang + 4 more

  • Open Access Icon
  • PDF Download Icon
  • Cite Count Icon 104
  • 10.1038/ncomms4976
Extortion subdues human players but is finally punished in the prisoner’s dilemma
  • May 29, 2014
  • Nature Communications
  • Christian Hilbe + 2 more

  • Cite Count Icon 1
  • 10.1007/s11432-023-4287-2
Asymmetric interaction preference induces cooperation in human-agent hybrid game
  • Apr 25, 2025
  • Science China Information Sciences
  • Danyang Jia + 5 more

  • 10.1016/j.chaos.2025.115998
Multi-games on a dynamic network and the evolution of cooperation
  • Mar 1, 2025
  • Chaos, Solitons & Fractals
  • Yijie Huang + 1 more

  • Cite Count Icon 109
  • 10.1126/science.1198349
Cooperation and the Commons
  • Nov 11, 2010
  • Science
  • Björn Vollan + 1 more

  • Open Access Icon
  • PDF Download Icon
  • Cite Count Icon 81
  • 10.1088/1367-2630/18/10/103007
Interactive diversity promotes the evolution of cooperation in structured populations
  • Oct 1, 2016
  • New Journal of Physics
  • Qi Su + 3 more

  • Cite Count Icon 3
  • 10.1016/j.physa.2024.129999
Reputation-based disconnection-reconnection mechanism in Prisoner's Dilemma Game within dynamic networks
  • Jul 26, 2024
  • Physica A: Statistical Mechanics and its Applications
  • Qianwei Zhang + 2 more

  • Cite Count Icon 13
  • 10.1016/j.physleta.2018.05.020
Evolution of global cooperation and ethnocentrism in group-structured populations
  • May 17, 2018
  • Physics Letters A
  • Shiping Gao + 2 more

  • 10.1016/j.chaos.2025.116496
Evolution of cooperation in spatial public goods games with resource-allocating leaders
  • Aug 1, 2025
  • Chaos, Solitons & Fractals
  • Ji Quan + 2 more

  • Open Access Icon
  • Cite Count Icon 2053
  • 10.1038/nature04605
A simple rule for the evolution of cooperation on graphs and social networks.
  • May 1, 2006
  • Nature
  • Hisashi Ohtsuki + 3 more

Similar Papers
  • Research Article
  • Cite Count Icon 10
  • 10.1016/j.neucom.2012.04.012
Thalamic cooperation between the cerebellum and basal ganglia with a new tropism-based action-dependent heuristic dynamic programming method
  • May 10, 2012
  • Neurocomputing
  • Xiaogang Ruan + 2 more

Thalamic cooperation between the cerebellum and basal ganglia with a new tropism-based action-dependent heuristic dynamic programming method

  • Supplementary Content
  • Cite Count Icon 1
  • 10.1155/2022/1327620
Multi-Agent-Based Film Editing Collaboration System.
  • Jul 1, 2022
  • Computational Intelligence and Neuroscience
  • Along Liang

In order to realize the effective cooperation between editor agents in the film and television editing collaboration system, it is analyzed that the state change of the film and television editing and production process is affected by the cross influence of multiple factors. A single agent can no longer satisfy the current film and television production. From the point of view of system theory, this article constructs the learner agent in the film and television editing system by introducing a new cooperation mechanism—the multi-agent collaborative system model. Collaboration and cooperation between multiple agents and the reinforcement learning between multiple editor agents are realized based on the film and television editing system between multiple agents. The operation mechanism of the separate organization is organized together, cooperates with each other and works in harmony to complete the collaborative effect of the film and television editing system, and can improve the interaction efficiency between the editor agents. Agent film and television editing's cooperative learning approach allows for successful collaboration among editor agents. The Bayesian technique is utilized in this study to assess the likelihood of effective cooperation between two agents, and a trust model based on this method is presented, making up for the shortcomings of the existing collaborative learning system. The multi-agent collaboration system will be utilized for production in the film and television editing collaboration system. Many of the movie's scenes and segments are created using computer technology special effects, giving viewers a very unique experience and a feast for the eyes and ears.

  • Research Article
  • Cite Count Icon 12
  • 10.3233/ifs-130945
Enhancing Nash Q-learning and Team Q-learning mechanisms by using bottlenecks
  • Jan 1, 2014
  • Journal of Intelligent & Fuzzy Systems
  • Behzad Ghazanfari + 1 more

Nash Q-learning and team Q-learning are extended versions of reinforcement learning method for using in Multi-agent systems as cooperation mechanisms. The complexity of multi-agent reinforcement learning systems is extremely high thus it is necessary to use complexity reduction methods like hierarchical structures, abstraction and task decomposition. A typical approach for the latter to define subtasks is based on extracting bottlenecks. In this paper, bottlenecks are automatically extracted to create temporally extended actions which are in turn added to available agent's actions in cooperation mechanisms of multi-agent systems. The updating equations of team Q-learning and Nash Q-learning are extended in such a way to involve temporally extended actions. In this way the performance of learning in team Q-learning and Nash Q-learning is considerably increased. The experimental results show an interesting improvement in the process of learning of cooperation mechanisms being augmented by extracted temporally actions in multi-agent problems.

  • Video Transcripts
  • 10.48448/zvx0-ef72
Investigating the Impact of Metacognition on Working Memory and Procedural Learning Mechanisms
  • Jul 4, 2021
  • Xiaochen Wu + 2 more

This study examined the influence of metacognition on declarative and reinforcement learning (RL) mechanisms. We collected data from 218 undergraduates using a within-subjects metacognitive manipulation of a stimulus-response (S-R) learning task created by Collins (2018). Contributions of declarative and RL mechanisms are assessed by differences in learning rate for blocks of 3 items versus 6 items, and by the rate of forgetting with an incidental post-test. If metacognition differentially affects declarative and RL, we expect a three-way interaction between the task phase (learning/post-test), block type (long/short), and metacognition (before/during). Our results showed significant main effects of phase (F(1,217) =143.18, p=9.18e-32), length (F(1,217)=541.11, p=2.06e-104) and metacognition (F(1,217) = 19.78, p = 9.22e-06), with better performance during the learning phase, short blocks, and metacognitive manipulation. A significant phase by metacognition interaction (F(1,217) =8.11, 4.45e-03) suggested that metacognition monitoring improved test performance while having little effect on learning performance.

  • Conference Article
  • Cite Count Icon 16
  • 10.1109/cec.2008.4630945
A Coalition-Based Metaheuristic for the vehicle routing problem
  • Jun 1, 2008
  • David Meignan + 2 more

This paper presents a population based Metaheuristic adopting the metaphor of social autonomous agents. In this context, agents cooperate and self-adapt in order to collectively solve a given optimization problem. From an evolutionary computation point of view, mechanisms driving the search consist of combining intensification operators and diversification operators, such as local search and mutation or recombination. The multiagent paradigm mainly focuses on the adaptive capabilities of individual agents evolving in a context of decentralized control and asynchronous communication. In the proposed metaheuristic, the agentpsilas behavior is guided by a decision process for the operatorspsilachoice which is dynamically adapted during the search using reinforcement learning and mimetism learning between agents. The approach is called Coalition-Based Metaheuristic (CBM) to refer to the strong autonomy conferred to the agents. This approach is applied to the Vehicle Routing Problem to emphasize the performance of learning and cooperation mechanisms.

  • Conference Article
  • 10.2514/6.2023-1070
Verification of Adversarially Robust Reinforcement Learning Mechanisms in Aerospace Systems
  • Jan 19, 2023
  • Taehwan Seo + 2 more

In this paper, we present a detailed framework for the verification and validation of learning-based reinforcement learning (RL) mechanisms in aerospace control software. First, we integrate an adversarial input mitigation and moving target defense framework and verify its efficacy in real-time. Then, we provide a testing framework to verify the robustness of closed-loop RL mechanisms. The reliability of the adversarially robust RL mechanism is tested using the VerifAI toolkit and an X-plane 11 Cessna 172.

  • Research Article
  • Cite Count Icon 1
  • 10.1007/s10845-024-02494-0
Knowledge graph-enhanced multi-agent reinforcement learning for adaptive scheduling in smart manufacturing
  • Oct 13, 2024
  • Journal of Intelligent Manufacturing
  • Zhaojun Qin + 1 more

Self-organizing manufacturing network has emerged as a viable solution for adaptive manufacturing control within the mass personalization paradigm. This approach involves three critical elements: system modeling and control architecture, interoperable communication, and adaptive manufacturing control. However, current research often separates interoperable communication from adaptive manufacturing control as isolated areas of study. To address this gap, this paper introduces Knowledge Graph-enhanced Multi-Agent Reinforcement Learning (MARL) method that integrates interoperable communication via Knowledge Graphs with adaptive manufacturing control through Reinforcement Learning. We hypothesize that implicit domain knowledge obtained from historical production job allocation records can guide each agent to learn more effective scheduling policies with accelerated learning rates. This is based on the premise that machine assignment preferences effectively could reduce the Reinforcement Learning search space. Specifically, we redesign machine agents with new observation, action, reward, and cooperation mechanisms considering the preference of machines, building upon our previous MARL base model. The scheduling policies are trained under extensive simulation experiments that consider manufacturing requirements. During the training process, our approach demonstrates improved training speed compared with individual Reinforcement Learning methods under the same training hyperparameters. The obtained scheduling policies generated by our Knowledge Graph-enhanced MARL also outperform both individual Reinforcement Learning methods and heuristic rules under dynamic manufacturing settings.

  • Research Article
  • Cite Count Icon 17
  • 10.1109/tpwrs.2022.3223255
Data-Driven Load Frequency Control Based on Multi-Agent Reinforcement Learning With Attention Mechanism
  • Nov 1, 2023
  • IEEE Transactions on Power Systems
  • Fan Yang + 5 more

With the massive penetration of renewable energy, traditional reinforcement learning algorithms suffer from slow convergence and area control error (ACE) in interconnected power systems. This paper proposes data-driven load frequency control (LFC) based on multi-agent reinforcement learning with attention mechanism in interconnected power systems. It can be divided into two phases; in the centralized training, the agents are trained by an experience replay mechanism; in the decentralized execution, the trained agent automatically regulates the generation power to control the load frequency by real-time access to the grid data in the area. The agent can selectively focus on specific information in the environment by introducing a criticism network with an attention mechanism. The attention mechanism can reduce the training time for reinforcement learning while improving control performance under disturbance. A novel reward function based on a cooperation mechanism is used to score the performance of agent, which can guide the reinforcement learning algorithm to reduce the ACE of each area simultaneously. The proposed method is validated by the IEEE three-area interconnected power system, and it is concluded that the method can reduce the ACE caused by load and renewable power disturbances, and greatly reduce the training time of the algorithm.

  • Research Article
  • Cite Count Icon 6
  • 10.1088/2632-072x/ad3f65
The emergence of cooperation via Q-learning in spatial donation game
  • Apr 26, 2024
  • Journal of Physics: Complexity
  • Jing Zhang + 4 more

Decision-making often overlooks the feedback between agents and the environment. Reinforcement learning is widely employed through exploratory experimentation to address problems related to states, actions, rewards, decision-making in various contexts. This work considers a new perspective, where individuals continually update their policies based on interactions with the spatial environment, aiming to maximize cumulative rewards and learn the optimal strategy. Specifically, we utilize the Q-learning algorithm to study the emergence of cooperation in a spatial population playing the donation game. Each individual has a Q-table that guides their decision-making in the game. Interestingly, we find that cooperation emerges within this introspective learning framework, and a smaller learning rate and higher discount factor make cooperation more likely to occur. Through the analysis of Q-table evolution, we disclose the underlying mechanism for cooperation, which may provide some insights to the emergence of cooperation in the real-world systems.

  • Research Article
  • 10.3389/fncom.2022.818985
Reinforcement Learning for Central Pattern Generation in Dynamical Recurrent Neural Networks.
  • Apr 8, 2022
  • Frontiers in Computational Neuroscience
  • Jason A Yoder + 3 more

Lifetime learning, or the change (or acquisition) of behaviors during a lifetime, based on experience, is a hallmark of living organisms. Multiple mechanisms may be involved, but biological neural circuits have repeatedly demonstrated a vital role in the learning process. These neural circuits are recurrent, dynamic, and non-linear and models of neural circuits employed in neuroscience and neuroethology tend to involve, accordingly, continuous-time, non-linear, and recurrently interconnected components. Currently, the main approach for finding configurations of dynamical recurrent neural networks that demonstrate behaviors of interest is using stochastic search techniques, such as evolutionary algorithms. In an evolutionary algorithm, these dynamic recurrent neural networks are evolved to perform the behavior over multiple generations, through selection, inheritance, and mutation, across a population of solutions. Although, these systems can be evolved to exhibit lifetime learning behavior, there are no explicit rules built into these dynamic recurrent neural networks that facilitate learning during their lifetime (e.g., reward signals). In this work, we examine a biologically plausible lifetime learning mechanism for dynamical recurrent neural networks. We focus on a recently proposed reinforcement learning mechanism inspired by neuromodulatory reward signals and ongoing fluctuations in synaptic strengths. Specifically, we extend one of the best-studied and most-commonly used dynamic recurrent neural networks to incorporate the reinforcement learning mechanism. First, we demonstrate that this extended dynamical system (model and learning mechanism) can autonomously learn to perform a central pattern generation task. Second, we compare the robustness and efficiency of the reinforcement learning rules in relation to two baseline models, a random walk and a hill-climbing walk through parameter space. Third, we systematically study the effect of the different meta-parameters of the learning mechanism on the behavioral learning performance. Finally, we report on preliminary results exploring the generality and scalability of this learning mechanism for dynamical neural networks as well as directions for future work.

  • Book Chapter
  • Cite Count Icon 12
  • 10.1007/978-3-319-30698-8_12
Limits to Learning in Reinforcement Learning Hyper-heuristics
  • Jan 1, 2016
  • Fawaz Alanazi + 1 more

Learning mechanisms in selection hyper-heuristics are used to identify the most appropriate subset of heuristics when solving a given problem. Several experimental studies have used additive reinforcement learning mechanisms, however, these are inconclusive with regard to the performance of selection hyper-heuristics with these learning mechanisms. This paper points out limitations to learning with additive reinforcement learning mechanisms. Our theoretical results show that if the probability of improving the candidate solution in each point of the search process is less than 1 / 2 which is a mild assumption, then additive reinforcement learning mechanisms perform asymptotically similar to the simple random mechanism which chooses heuristics uniformly at random. In addition, frequently used adaptation schemes can affect the memory of reinforcement learning mechanisms negatively. We also conducted experiments on two well-known combinatorial optimisation problems, bin-packing and flow-shop, and the obtained results confirm the theoretical findings. This study suggests that alternatives to the additive updates in reinforcement learning mechanisms should be considered.

  • Book Chapter
  • Cite Count Icon 4
  • 10.1007/11893028_54
A Cooperation Online Reinforcement Learning Approach in Ant-Q
  • Jan 1, 2006
  • Seunggwan Lee

Online reinforcement learning achieves learning after update estimation value for (state, action) pairs selecting in present state before do state transition by next state. Therefore, online reinforcement learning needs polynomial search time to find most optimal value-function. But, a lots of reinforcement learning that are proposed for online reinforcement learning update estimation value for (state, action) pairs that agents select in present state, and because estimation value for unselected (state, action) pairs is evaluated in other episodes, perfect online reinforcement learning is not. Therefore, in this paper, we propose online ant reinforcement learning method using Ant-Q and eligibility trace to solve this problem. The eligibility trace is one of the basic mechanisms in reinforcement learning to handle delayed reward. The traces are said to indicate the degree to which each state is eligible for undergoing learning changes should a reinforcing event occur. Formally, there are two kinds of eligibility traces(accumulating trace or replacing traces). In this paper, we propose online ant reinforcement learning algorithms using an eligibility traces which is called replace-trace methods. This method is a hybrid of Ant-Q and eligibility traces. Although replacing traces are only slightly different from accumulating traces, it can produce a significant improvement in optimization. We could know through an experiment that proposed reinforcement learning method converges faster to optimal solution than Ant Colony System and Ant-Q.

  • Research Article
  • Cite Count Icon 210
  • 10.1093/cercor/bhr117
Mechanisms of Hierarchical Reinforcement Learning in Cortico-Striatal Circuits 2: Evidence from fMRI
  • Jun 21, 2011
  • Cerebral Cortex
  • D Badre + 1 more

The frontal lobes may be organized hierarchically such that more rostral frontal regions modulate cognitive control operations in caudal regions. In our companion paper (Frank MJ, Badre D. 2011. Mechanisms of hierarchical reinforcement learning in corticostriatal circuits I: computational analysis. 22:509-526), we provide novel neural circuit and algorithmic models of hierarchical cognitive control in cortico-striatal circuits. Here, we test key model predictions using functional magnetic resonance imaging (fMRI). Our neural circuit model proposes that contextual representations in rostral frontal cortex influence the striatal gating of contextual representations in caudal frontal cortex. Reinforcement learning operates at each level, such that the system adaptively learns to gate higher order contextual information into rostral regions. Our algorithmic Bayesian "mixture of experts" model captures the key computations of this neural model and provides trial-by-trial estimates of the learner's latent hypothesis states. In the present paper, we used these quantitative estimates to reanalyze fMRI data from a hierarchical reinforcement learning task reported in Badre D, Kayser AS, D'Esposito M. 2010. Frontal cortex and the discovery of abstract action rules. Neuron. 66:315--326. Results validate key predictions of the models and provide evidence for an individual cortico-striatal circuit for reinforcement learning of hierarchical structure at a specific level of policy abstraction. These findings are initially consistent with the proposal that hierarchical control in frontal cortex may emerge from interactions among nested cortico-striatal circuits at different levels of abstraction.

  • Research Article
  • Cite Count Icon 44
  • 10.1016/j.jmsy.2023.03.003
Dynamic production scheduling towards self-organizing mass personalization: A multi-agent dueling deep reinforcement learning approach
  • Apr 9, 2023
  • Journal of Manufacturing Systems
  • Zhaojun Qin + 2 more

Dynamic production scheduling towards self-organizing mass personalization: A multi-agent dueling deep reinforcement learning approach

  • Research Article
  • Cite Count Icon 10
  • 10.1016/j.energy.2024.130434
Intelligent optimization method for real-time decision-making in laminated cooling configurations through reinforcement learning
  • Jan 22, 2024
  • Energy
  • Yanjia Wang + 5 more

Intelligent optimization method for real-time decision-making in laminated cooling configurations through reinforcement learning

More from: Chaos, Solitons & Fractals
  • Research Article
  • 10.1016/j.chaos.2025.116975
A novel wind speed prediction method based on fractal wavelet decomposition explainable gated recurrent unit
  • Nov 1, 2025
  • Chaos, Solitons & Fractals
  • Ji Jin + 2 more

  • Research Article
  • 10.1016/j.chaos.2025.116923
Emotion-coupled Q-learning with cognitive bias enhances cooperation in evolutionary prisoner’s dilemma games
  • Nov 1, 2025
  • Chaos, Solitons & Fractals
  • Jiaying Lin + 1 more

  • Research Article
  • 10.1016/j.chaos.2025.117014
Dynamical modeling of hippocampal-basal ganglia interactions for spatial navigation
  • Nov 1, 2025
  • Chaos, Solitons & Fractals
  • Haobin Wei + 4 more

  • Research Article
  • 10.1016/j.chaos.2025.116934
Prey-predator dynamics with adaptive hawk-dove strategies
  • Nov 1, 2025
  • Chaos, Solitons & Fractals
  • Souvick Karmakar + 3 more

  • Research Article
  • 10.1016/j.chaos.2025.117160
Geometry-aware reservoirs: Patch-wise Jacobian lifting with cross-patch couplings for piecewise-linear modelling of chaotic flows
  • Nov 1, 2025
  • Chaos, Solitons & Fractals
  • Pradeep Singh + 2 more

  • Research Article
  • 10.1016/j.chaos.2025.117023
Study on the bifurcation of 3D patterns in a dielectric barrier discharge with modulated gas gap
  • Nov 1, 2025
  • Chaos, Solitons & Fractals
  • Lifang Dong + 5 more

  • Research Article
  • 10.1016/j.chaos.2025.117020
Ion-acoustic shock waves and chaotic motions in certain Thomas Fermi plasmas
  • Nov 1, 2025
  • Chaos, Solitons & Fractals
  • Zakia Rahim + 2 more

  • Research Article
  • 10.1016/j.chaos.2025.116967
A novel evolutionary deep reinforcement learning algorithm for the influence maximization problem in multilayer social networks
  • Nov 1, 2025
  • Chaos, Solitons & Fractals
  • Jianxin Tang + 4 more

  • Research Article
  • 10.1016/j.chaos.2025.116970
Dynamics in a general predator–prey chemotaxis model with signal-dependent diffusion and sensitivity
  • Nov 1, 2025
  • Chaos, Solitons & Fractals
  • Jianping Gao + 2 more

  • Research Article
  • 10.1016/j.chaos.2025.117134
Bio-inspired spatiotemporal encoding of sound in a dual-capacitor neuron circuit
  • Nov 1, 2025
  • Chaos, Solitons & Fractals
  • Zhigang Zhu + 1 more

Save Icon
Up Arrow
Open/Close
  • Ask R Discovery Star icon
  • Chat PDF Star icon

AI summaries and top papers from 250M+ research sources.

Search IconWhat is the difference between bacteria and viruses?
Open In New Tab Icon
Search IconWhat is the function of the immune system?
Open In New Tab Icon
Search IconCan diabetes be passed down from one generation to the next?
Open In New Tab Icon