Leader-centric adaptive group formation via multi-agent reinforcement learning for group recommendation
Leader-centric adaptive group formation via multi-agent reinforcement learning for group recommendation
- Research Article
10
- 10.1360/ssi-2020-0180
- May 1, 2022
- SCIENTIA SINICA Informationis
Reinforcement learning (RL) technology has been successfully applied to various continuous decision environments in decades of development. Nowadays, RL is attracting more attention, even being touted as one of the closest approaches to general artificial intelligence. However, real-world problems often involve multiple intelligent agents interacting with each other. Thus, we focus on multi-agent reinforcement learning (MARL) to deal with such multi-agent systems in practice. In the past decade, the combination of multi-agent system and RL has become increasingly close, gradually forming and enriching the research field of MARL. Reviewing the studies on MARL, we found that researchers mainly solve MARL problems from three perspectives: learning framework, joint action learning, and communication-based MARL. In this paper, we focus from the studies on the communication perspective. We first state the reasons for choosing communication-based MARL and then list the president studies falling into the MARL category but different in nature. We hope that this article can provide a reference for developing MARL methods that can solve practical problems for the national welfare.
- Research Article
1
- 10.5555/1016416.1016419
- Dec 1, 2003
This paper presents a multi-agent reinforcement learning bidding approach (MARLBS) for performing multi-agent reinforcement learning. MARLBS integrates reinforcement learning, bidding and genetic a...
- Conference Article
- 10.26868/25222708.2025.1359
- Aug 24, 2025
Aim and ApproachLarge cooling water system has great potential of energy saving due to improper operation control. The coupled hydraulic and thermodynamic characteristics among chillers, pumps, and cooling towers lead to the difficulty of cooling water system operation. This study investigates an advanced control approach of multi-agent reinforcement learning (RL) for a large cooling water system. The multiple RL agents interact with each other to deal with the complex coupled characteristics in the cooling water system. The proposed approach is implemented in a real-world cooling water system and validated with experiments.Scientific Innovation and RelevanceIn this study, a multi-agent RL approach is proposed to separately control the chillers, pumps, and cooling towers in the cooling water system. The soft actor critic algorithm is used in the RL agent, as it is an efficient and advanced RL algorithm for continuous and discrete control problems. The agents are designed to have private states, private actions, and shared rewards. The reward functions are carefully designed to balance the agents for global optima.To provide a realistic environment for RL training, a detailed physical modeling platform of a cooling water system is established and validated. This platform is capable of hydraulic and thermodynamic calculation of cooling water systems with customized numbers and characteristics of chillers, pumps, and cooling towers, which is scalable for different cooling water systems.The proposed multi-agent RL controller is implemented in the building automation system of a real cooling water system. A one-month experimental study of the multi-agent RL performance is conducted. The proposed multi-agent RL controller is compared with a single-agent RL controller and a rule-based controller through real-world experiments to demonstrate the energy performance. This is a pilot study of multi-agent RL approach to real-world cooling water system control.Preliminary Results and ConclusionsThe preliminary simulation study is conducted in the established cooling water system model. The electricity consumption simulation error is 4.0% in coefficient of variation of mean absolute error (CVMAE) compared with the measured 15-minute electricity consumption. The 6-month simulation shows that the single-agent RL can save 6.2% energy from the rule-based control, while the multi-agent RL can save 7.2% electricity consumption of the cooling water system. In the following study, the multi-agent RL implementation will be practiced with more experiments and applications for cooling water system control.
- Conference Article
1
- 10.23919/chicc.2018.8483961
- Jul 1, 2018
Recently, the researches on multi-agent reinforcement learning (MARL) have attracted tremendous interest in many applications, especially for autonomous driving. The main problem of MARL is how to deal with the uncertainty in the environment and the interaction between the connected agents. To solve the problem, a distributed robust temporal differential deep Q-network algorithm (MARTD-DQN) was developed in this paper. MARTD-DQN consists of two parts, the decentralized MARL algorithm (DMARL) and the robust TD deep Q-network algorithm (RTD-DQN). DMARL improves the robustness of the policy estimation by fusing the states from the neighbors over communicated networks. RTD- DQN improves the robustness to outliers through on-line estimation of the uncertainty. By combining the two algorithms, the proposed algorithm can be robust not only to node failures but also to the outliers. Then the proposed algorithm is applied to ACC simulations of autonomous cars. The simulation results are given to show the efficiency of the proposed algorithm.
- Research Article
9
- 10.1155/2022/5681234
- Apr 6, 2022
- Journal of Advanced Transportation
Urban traffic control systems (UTCSs) are deployed to a great number of urban cities despite lacking feedback when adjusting the traffic signals. The development of reinforcement learning (RL) makes it possible to apply feedback to UTCS, and great efforts have been made on RL-based traffic control strategies. However, those studies are regardless of the traffic flow theory of the network and the road users’ perspectives on the performance of traffic. This study proposes a multiagent reinforcement learning (MARL) based traffic control strategy, in which each intersection in a macroscopic fundamental diagram (MFD) region was controlled by one agent using the level of services (LOS) and MFD-based parameters as rewards. The proposed MARL strategy was evaluated by simulation in a 3×3 grid network compared with pretimed, actuated, and MFD-based traffic control strategies. The evaluation results showed that, at different demand levels, the proposed MARL strategy outperforms the other three traffic control strategies in terms of average intersection queue length and average intersection waiting time to a different extent. Results also showed that the proposed MARL dissipated the congestion faster than the other three control strategies. Results of the Friedman test indicated that the differences in performances between the proposed MARL and other strategies were statistically significant regardless of the demand level. The MFD in the testbed network controlled by the proposed MARL was different from that controlled by the pretimed strategy, especially the MFD scatter plot. It provides insights on considering the traffic flow theory of the network when applying MARL to traffic control strategies.
- Conference Article
9
- 10.1109/icsmc.2003.1244302
- Nov 10, 2003
Multiagent learning is deeply rooted in single-agent learning. It is common thought that multiagent learning has a better result than single-agent learning with communication and knowledge sharing. This paper gives a different result in the robot foraging domain with multiagent and single-agent reinforcement learning methods. We show how a single-agent reinforcement learning method performs better than various multiagent reinforcement learning methods. Thus we propose a hypothesis: In normal robot foraging tasks with reinforcement learning, single-agent reinforcement learning is better that any multiagent reinforcement learning.
- Book Chapter
3
- 10.1007/978-3-030-69322-0_11
- Jan 1, 2021
We propose a novel approach to address one aspect of the non-stationarity problem in multi-agent reinforcement learning (RL), where the other agents may alter their policies due to environment changes during execution. This violates the Markov assumption that governs most single-agent RL methods and is one of the key challenges in multi-agent RL. To tackle this, we propose to train multiple policies for each agent and postpone the selection of the best policy at execution time. Specifically, we model the environment non-stationarity with a finite set of scenarios and train policies fitting each scenario. In addition to multiple policies, each agent also learns a policy predictor to determine which policy is the best with its local information. By doing so, each agent is able to adapt its policy when the environment changes and consequentially the other agents alter their policies during execution. We empirically evaluated our method on a variety of common benchmark problems proposed for multi-agent deep RL in the literature. Our experimental results show that the agents trained by our algorithm have better adaptiveness in changing environments and outperform the state-of-the-art methods in all the tested environments.
- Research Article
475
- 10.1109/twc.2019.2933417
- Nov 1, 2019
- IEEE Transactions on Wireless Communications
Heterogeneous cellular networks can offload the mobile traffic and reduce the deployment costs, which have been considered to be a promising technique in the next-generation wireless network. Due to the non-convex and combinatorial characteristics, it is challenging to obtain an optimal strategy for the joint user association and resource allocation issue. In this paper, a reinforcement learning (RL) approach is proposed to achieve the maximum long-term overall network utility while guaranteeing the quality of service requirements of user equipments (UEs) in the downlink of heterogeneous cellular networks. A distributed optimization method based on multi-agent RL is developed. Moreover, to solve the computationally expensive problem with the large action space, multi-agent deep RL method is proposed. Specifically, the state, action and reward function are defined for UEs, and dueling double deep Q-network (D3QN) strategy is introduced to obtain the nearly optimal policy. Through message passing, the distributed UEs can obtain the global state space with a small communication overhead. With the double-Q strategy and dueling architecture, D3QN can rapidly converge to a subgame perfect Nash equilibrium. Simulation results demonstrate that D3QN achieves the better performance than other RL approaches in solving large-scale learning problems.
- Research Article
53
- 10.1109/tccn.2019.2933420
- Dec 1, 2019
- IEEE Transactions on Cognitive Communications and Networking
We aim to jointly optimize antenna tilt angle, and vertical and horizontal half-power beamwidths of the macrocells in a heterogeneous cellular network (HetNet). The interactions between the cells, most notably due to their coupled interference render this optimization prohibitively complex. Utilizing a single agent reinforcement learning (RL) algorithm for this optimization becomes quite suboptimum despite its scalability, whereas multi-agent RL algorithms yield better solutions at the expense of scalability. Hence, we propose a two-step compromise algorithm. Specifically, a multi-agent mean field RL algorithm is first utilized in the offline phase so as to transfer information as features for the second (online) phase single agent RL algorithm, which employs a deep neural network to learn users locations. This two-step approach is a practical solution for real deployments, which should automatically adapt to environmental changes in the network. Our results illustrate that the proposed algorithm approaches the performance of the multi-agent RL, which requires millions of trials, with hundreds of online trials, assuming relatively low environmental dynamics, and performs much better than a single agent RL. Furthermore, the proposed algorithm is compact and implementable, and empirically appears to provide a performance guarantee regardless of the amount of environmental dynamics.
- Book Chapter
1103
- 10.1007/978-3-030-60990-0_12
- Jan 1, 2021
Recent years have witnessed significant advances in reinforcement learning (RL), which has registered tremendous success in solving various sequential decision-making problems in machine learning. Most of the successful RL applications, e.g., the games of Go and Poker, robotics, and autonomous driving, involve the participation of more than one single agent, which naturally fall into the realm of multi-agent RL (MARL), a domain with a relatively long history, and has recently re-emerged due to advances in single-agent RL techniques. Though empirically successful, theoretical foundations for MARL are relatively lacking in the literature. In this chapter, we provide a selective overview of MARL, with focus on algorithms backed by theoretical analysis. More specifically, we review the theoretical results of MARL algorithms mainly within two representative frameworks, Markov/stochastic games and extensive-form games, in accordance with the types of tasks they address, i.e., fully cooperative, fully competitive, and a mix of the two. We also introduce several significant but challenging applications of these algorithms. Orthogonal to the existing reviews on MARL, we highlight several new angles and taxonomies of MARL theory, including learning in extensive-form games, decentralized MARL with networked agents, MARL in the mean-field regime, (non-)convergence of policy-based methods for learning in games, etc. Some of the new angles extrapolate from our own research endeavors and interests. Our overall goal with this chapter is, beyond providing an assessment of the current state of the field on the mark, to identify fruitful future research directions on theoretical studies of MARL. We expect this chapter to serve as continuing stimulus for researchers interested in working on this exciting while challenging topic.
- Book Chapter
1
- 10.4018/978-1-5225-6367-9.ch007
- Jan 1, 2019
This chapter presents the adaptive group formation and/or peer help technique implemented by various systems so far, and particularly, by web-based adaptive educational hypermedia systems (AEHSs). At first, some concepts about group formation and peer help are described, and a general description of the MATHEMA is made. Subsequently, the overview of the adaptive group formation considers extensively how several systems have implemented this technique so far. A comparative study of the presented systems with the MATHEMA is performed and conclusions are drawn. The systems that implement the adaptive group formation and/or peer help technique are the (M)CSCL, AELS, and AEHSs. In presentation of the adaptive grouping algorithm of the MATHEMA, the following are described: (1) how the priority list is created; (2) how the learners are supported in selecting their most suitable partner; (3) how the negotiation protocol works; and (4) how the peer groups are automatically linked up for a collaboration agreement using a peer-to-peer communication tool.
- Book Chapter
5
- 10.1007/978-3-031-10161-8_8
- Jan 1, 2022
Using multi-agent reinforcement learning to find solutions to complex decision-making problems in shared environments has become standard practice in many scenarios. However, this is not the case in safety-critical scenarios, where the reinforcement learning process, which uses stochastic mechanisms, could lead to highly unsafe outcomes. We proposed a novel, safe multi-agent reinforcement learning approach named Assured Multi-Agent Reinforcement Learning (AMARL) to address this issue. Distinct from other safe multi-agent reinforcement learning approaches, AMARL utilises quantitative verification, a model checking technique that guarantees agent compliance of safety, performance, and non-functional requirements, both during and after the learning process. We have previously evaluated AMARL in patrolling domains with various multi-agent reinforcement learning algorithms for both homogeneous and heterogeneous systems. In this work we extend AMARL through the use of deep multi-agent reinforcement learning. This approach is particularly appropriate for systems in which the rewards are sparse and hence extends the applicability of AMARL. We evaluate our approach within a new search and collection domain which demonstrates promising results in safety standards and performance compared to algorithms not using AMARL.KeywordsReinforcement LearningMulti-Agent SystemsQuantitative verificationAssuranceMulti-Agent Reinforcement LearningSafety-critical scenariosSafe Multi-Agent Reinforcement LearningAssured Multi-Agent Reinforcement LearningDeep Reinforcement Learning
- Research Article
25
- 10.1109/tfuzz.2022.3214001
- Feb 1, 2023
- IEEE Transactions on Fuzzy Systems
In multiagent reinforcement learning (RL), multilayer fully connected neural network is used for value function approximation, which solves large-scale or continuous space problems. However, it is easy to fall into a local optimal and overfitting under partially observed environments. Because each agent lacks the information that plays a key role in decision making beyond the observation field. Even if communication is allowed, the received informations in communication channel have large noise due to the observations of other agents and strong uncertainty if the agent's policy is used as the communication information. To tackle this problem, two-stream fused fuzzy deep neural network (2s-FDNN) was proposed to reduce the uncertainty and noise of information in the communication channel. It is a parallel structure in which the fuzzy inference module reduces the uncertainty of information and the deep neural module reduces the noise of information. Then, we presented Fuzzy MA2C which integrates 2s-FDNN into multiagent deep RL to deal with uncertain communication informations for improving the robustness and generalization under partially observed environments. We empirically evaluate our methods in two large-scale traffic signal control environments using simulation of urban mobility (SUMO) simulator. Results demonstrate that our methods can achieve superior performance against existing RL algorithms.
- Conference Article
4
- 10.1109/icdcs54860.2022.00062
- Jul 1, 2022
With the emergence of edge devices along with their local computation advantage over the cloud, distributed deep learning (DL) training on edge nodes becomes promising. In such a method, the cluster head of a cluster of edge nodes schedules all the DL training jobs from the cluster nodes. Using such a centralized scheduling method, the cluster head knows all the loads of the cluster nodes, which can avoid overloading the cluster nodes, but the head itself may become overloaded. To handle this problem, we first propose a multi-agent RL (MARL) system that enables each edge node to schedule its own jobs using RL. However, without the coordination between the nodes, action collision may occur, in which multiple nodes may schedule tasks to the same node and make it overloaded. To avoid these problems, we propose a system called Shielded ReinfOrcement learning (RL) based DL training on Edges (SROLE). In SROLE, each edge node schedules its own jobs using multi-agent RL. The shield deployed in a node checks action collisions and provides alternative actions to avoid the collisions. As the central shield node for the entire cluster may become a bottleneck, we further propose a decentralized shielding method, in which different shields are responsible for different regions in the cluster and they coordinate to avoid action collisions on the region boundaries. Our container-based emulation experiments show that SROLE reduces training time by up to 59% with 29% lower median resource utilization and reduces the number of action collisions by up to 48% compared to multi-agent RL and the centralized RL. Our real device experiments show that SROLE still reduces the training time by up to 53% with 28% lower median resource utilization than multi-agent RL and the centralized RL.
- Research Article
42
- 10.1007/s11633-022-1383-7
- Mar 31, 2023
- Machine Intelligence Research
Offline reinforcement learning leverages previously collected offline datasets to learn optimal policies with no necessity to access the real environment. Such a paradigm is also desirable for multi-agent reinforcement learning (MARL) tasks, given the combinatorially increased interactions among agents and with the environment. However, in MARL, the paradigm of offline pre-training with online fine-tuning has not been studied, nor even datasets or benchmarks for offline MARL research are available. In this paper, we facilitate the research by providing large-scale datasets and using them to examine the usage of the decision transformer in the context of MARL. We investigate the generalization of MARL offline pre-training in the following three aspects: 1) between single agents and multiple agents, 2) from offline pretraining to online fine tuning, and 3) to that of multiple downstream tasks with few-shot and zero-shot capabilities. We start by introducing the first offline MARL dataset with diverse quality levels based on the StarCraftII environment, and then propose the novel architecture of multi-agent decision transformer (MADT) for effective offline learning. MADT leverages the transformer’s modelling ability for sequence modelling and integrates it seamlessly with both offline and online MARL tasks. A significant benefit of MADT is that it learns generalizable policies that can transfer between different types of agents under different task scenarios. On the StarCraft II offline dataset, MADT outperforms the state-of-the-art offline reinforcement learning (RL) baselines, including BCQ and CQL. When applied to online tasks, the pre-trained MADT significantly improves sample efficiency and enjoys strong performance in both few-short and zero-shot cases. To the best of our knowledge, this is the first work that studies and demonstrates the effectiveness of offline pre-trained models in terms of sample efficiency and generalizability enhancements for MARL.