Evolutionary dispersal of ecological species via Multi-Agent Deep Reinforcement Learning
Evolutionary dispersal of ecological species via Multi-Agent Deep Reinforcement Learning
- Research Article
- 10.17762/turcomat.v12i7.2961
- Apr 19, 2021
- Turkish Journal of Computer and Mathematics Education (TURCOMAT)
This paper focuses on an approach that uses two methodologies: Maximum Entropy Inverse Reinforcement Learning and Multi-Agent Deep reinforcement learning. It focuses on each method in a adequately profound manner. The nature of data is also specified and it is also displayed how it is prepared it to suit their purpose. The use of Keras as one of their tools to construct modest deep neural networks. For results, the relation between the outcomes of each method with proper explanation is configured. A brief study of the direction in which the approach can be led forward in the future is analyzed.
- Conference Article
6
- 10.65109/cdmt9885
- May 6, 2024
Deep reinforcement learning (DRL) is gaining popularity in task-offloading problems because it can adapt to dynamic changes and minimize online computational complexity. However, the various types of continuous and discrete resource constraints on user devices (UDs) and mobile edge computing (MEC) servers pose challenges to the design of an efficient DRL-based task-offloading strategy. Existing DRL-based task-offloading algorithms focus on the constraints of the UDs, assuming the availability of enough storage resources on the server. Moreover, existing multiagent DRL (MADRL)-based task-offloading algorithms are homogeneous agents and consider homogeneous constraints as a penalty in their reward function. In this work, we propose a novel combinatorial client-master MADRL (CCM_MADRL) algorithm for task offloading in mobile edge computing (CCM_MADRL_MEC) that allows UDs to decide their resource requirements and the server to make a combinatorial decision based on the UDs' requirements. CCM_MADRL_MEC is the first MADRL approach in task offloading to consider server storage capacity in addition to the constraints of the UDs. By taking advantage of the combinatorial action selection, CCM_MADRL_MEC has shown superior convergence over existing benchmark and heuristic algorithms.
- Research Article
1
- 10.3389/fpls.2025.1610571
- Oct 15, 2025
- Frontiers in Plant Science
IntroductionThe rational structure of forest stands plays a crucial role in maintaining ecosystem functions, enhancing community stability, and ensuring sustainable management. Although progress has been made in stand structure optimization, most existing studies focus on static improvements and fail to adequately capture the dynamic nature of stand development. In addition, commonly used heuristic and traditional methods often suffer from limitations in computational efficiency and generalization ability.MethodsTo address these challenges, this study explores the potential and advantages of multi-agent deep reinforcement learning in forest management, offering innovative insights and methods for achieving sustainable forest ecosystem management. Using the secondary forests of Pinus yunnanensis in southwest China as the research subject, we constructed an objective function and constraints based on spatial and non-spatial structure indexes. Selective harvesting and replanting were employed as optimization measures, and experiments were conducted on five circular plots to compare the performance of multi-agent deep reinforcement learning with that of multi-agent reinforcement learning. To account for the dynamic characteristics of stand structure, we further integrated structure prediction with multi-agent deep reinforcement learning for dynamic optimization across the five plots.ResultsThe results indicate that multi agent deep reinforcement learning consistently outperformed multi agent reinforcement learning across all plots. For the initial objective function values of each plot (0.3501, 0.3799, 0.3982, 0.3344, 0.4294), the optimized results obtained through multi agent deep reinforcement learning (0.5378, 0.5861, 0.5860, 0.5130, 0.6034) were significantly superior to the maximum objective function values achieved by multi agent reinforcement learning (0.5302, 0.5369, 0.5766, 0.5014, 0.5906). Furthermore, the dynamic optimization results incorporating structure prediction demonstrate that all plots progressively approached an ideal stand condition over multiple optimization cycles (0.5718, 0.6101, 0.6455, 0.5863, 0.6210), leading to a more balanced stand structure and improved long-term stability.DiscussionThis study proposes a novel stand structure optimization method that integrates multi agent deep reinforcement learning with structure prediction, providing theoretical support and practical guidance for the sustainable management of Pinus yunnanensis secondary forests.
- Research Article
5
- 10.1109/tmc.2025.3539945
- Jul 1, 2025
- IEEE Transactions on Mobile Computing
Multi-access edge computing has become an effective paradigm to provide offloading services for computation-intensive and delay-sensitive tasks on vehicles. However, high mobility of vehicles usually incurs spatio-temporal load-imbalances among edge servers. Therefore, task migration is employed to maintain dynamic workload balancing by transmitting excessive tasks from overloaded to underloaded servers. Recent studies adopt deep reinforcement learning approaches to generate offloading and migration decisions based on current observations of systems. However, we argue that the migration direction is highly dependent on vehicular movements, and task migration towards the wrong direction could lead to additional delays. Therefore, we emphasize the importance of guiding task migration via exploring prospective trajectories of vehicles. We propose a Mobility-Aware Cooperative Multi-Agent (MCMA) deep reinforcement learning approach to make vehicle-by-vehicle decisions in multi-edge computation offloading scenarios. A two-stage decision framework is designed to solve the joint optimization problem of computation offloading and resource allocation. Additionally, an Informer-based multi-step vehicular trajectory prediction module is incorporated to enhance the capability of forecasting vehicular movements. Extensive experiments and analysis are conducted on synthetic and realistic scenarios, showing that our approach consistently outperforms both heuristic and DRL-based methods. The simulation scenarios and source codes are publicly available here.
- Research Article
- 10.1155/int/1477541
- Jan 1, 2026
- International Journal of Intelligent Systems
This work introduces a multiagent deep reinforcement learning (MADRL) framework for energy harvesting (EH) in unmanned aerial vehicle (UAV) networks aided by reconfigurable intelligent surfaces (RIS). The core goal is to maximize quantities of harvested energy subject to quality of service (QoS) constraints in dynamic wireless setups. The considered model involves centralized training alongside decentralized execution, together with replay‐based learning to yield stable convergence. Extensive experiments are included in a comparison between MADRL and evolution strategies (ES), deep deterministic policy gradient (DDPG), stochastic DDPG (SD3), and state of the art twin delayed DDPG (TD3)–based approaches. The outcomes verify that MADRL ensures an average throughput of over 300 Mbps when deployed with four UAVs, exceeding DDPG and closing in on TD3 and adaptive TD3, and consumes minimal processing time and memory resources. In time‐domain tests, MADRL maintains an EH fraction in the range of approximately 0.27–0.31 ((mean≈0.29)), and in dual‐domain evaluation, it sustains an EH fraction of approximately 0.73–0.75 (mean≈0.74), indicating robust energy performance under both scenarios. Parameter sensitivity analysis also confirms the selection of hyperparameter α , β , η , and γ as optimal trade‐offs at α = 0.6, β = 0.8, η = 3 × 10 −4 , γ = 0.98. The computational tests verify potential practicability in real‐time applications, where MADRL only takes 0.42 s and 1.8 GBs, respectively, for each episode, and in terms of memory. These confirmations reflect the potential applicability of MADRL in scalable UAV RIS networks and thus provide potential applications in energy‐efficient wireless setups.
- Supplementary Content
5
- 10.17638/03077940
- Mar 6, 2020
- University of Liverpool
Deep Neural Networks enable Reinforcement Learning (RL) agents to learn behaviour policies directly from high-dimensional observations. As a result, the field of Deep Reinforcement Learning (DRL) has seen a great number of successes. Recently the sub-field of Multi-Agent DRL (MADRL) has received an increased amount of attention. However, considerations are required when using RL in Multi-Agent Systems. For instance Independent Learners (ILs) lack the convergence guarantees of many single-agent RL approaches, even in domains that do not require a MADRL approach. Furthermore, ILs must often overcome a number of learning pathologies to converge upon an optimal joint-policy. Numerous IL approaches have been proposed to facilitate cooperation, including hysteretic Q-learning (Matignon et al., 2007) and leniency (Panait et al., 2006). Recently LMRL2, a variation of leniency, proved robust towards a number of pathologies in low-dimensional domains, including miscoordination, relative overgeneralization, stochasticity, the alter-exploration problem and the moving target problem (Wei and Luke, 2016). In contrast, the majority of work on ILs in MADRL focuses on an amplified moving target problem, caused by neural networks being trained with potentially obsolete samples drawn from experience replay memories. In this thesis we combine advances from research on ILs with DRL algorithms. However, first we evaluate the robustness of tabular approaches along each of the above pathology dimensions. Upon identifying a number of weaknesses that prevent LMRL2 from consistently converging upon optimal joint-policies we propose a new version of leniency, Distributed-Lenient Q-learning (DLQ). We find DLQ delivers state of the art performances in strategic-form and Markov games from Multi-Agent Reinforcement Learning literature. We subsequently scale leniency to MADRL, introducing Lenient (Double) Deep Q-Network (LDDQN). We empirically evaluate LDDQN with extensions of the Cooperative Multi-Agent Object Transportation Problem (Bucsoniu et al., 2010), finding that LDDQN outperforms hysteretic deep Q-learners in domains with multiple dropzones yielding stochastic rewards. Finally, to evaluate deep ILs along each pathology dimension we introduce a new MADRL environment: the Apprentice Firemen Game (AFG). We find lenient and hysteretic approaches fail to consistently learn near optimal joint-policies in the AFG. To address these pathologies we introduce Negative Update Intervals-DDQN (NUI-DDQN), a MADRL algorithm which discards episodes yielding cumulative rewards outside the range of expanding intervals. NUI-DDQN consistently gravitates towards optimal joint-policies in deterministic and stochastic reward settings of the AFG, overcoming the outlined pathologies.
- Research Article
3
- 10.3390/s22051746
- Feb 23, 2022
- Sensors (Basel, Switzerland)
Future network services must adapt to the highly dynamic uplink and downlink traffic. To fulfill this requirement, the 3rd Generation Partnership Project (3GPP) proposed dynamic time division duplex (D-TDD) technology in Long Term Evolution (LTE) Release 11. Afterward, the 3GPP RAN#86 meeting clarified that 5G NR needs to support dynamic adjustment of the duplex pattern (transmission direction) in the time domain. Although 5G NR provides a more flexible duplex pattern, how to configure an effective duplex pattern according to services traffic is still an open research area. In this research, we propose a distributed multi-agent deep reinforcement learning (MARL) based decentralized D-TDD configuration method. First, we model a D-TDD configuration problem as a dynamic programming problem. Given the buffer length of all UE, we model the D-TDD configuration policy as a conditional probability distribution. Our goal is to find a D-TDD configuration policy that maximizes the expected discount return of all UE’s sum rates. Second, in order to reduce signaling overhead, we design a fully decentralized solution with distributed MARL technology. Each agent in MARL makes decisions only based on local observations. We regard each base station (BS) as an agent, and each agent configures uplink and downlink time slot ratio according to length of intra-BS user (UE) queue buffer. Third, in order to solve the problem of overall system revenue caused by the lack of global information in MARL, we apply leniency control and binary LSTM (BLSTM) based auto-encoder. Leniency controller effectively controls Q-value estimation process in MARL according to Q-value and current network conditions, and auto-encoder makes up for the defect that leniency control cannot handle complex environments and high-dimensional data. Through the parallel distributed training, the global D-TDD policy is obtained. This method deploys the MARL algorithm on the Mobile Edge Computing (MEC) server of each BS and uses the storage and computing capabilities of the server for distributed training. The simulation results show that the proposed distributed MARL converges stably in various environments, and performs better than distributed deep reinforcement algorithm.
- Dissertation
- 10.31979/etd.tpey-94k6
- Dec 22, 2020
This project was motivated by seeking an AI method towards Artificial General Intelligence (AGI), that is, more similar to learning behavior of human-beings. As of today, Deep Reinforcement Learning (DRL) is the most closer to the AGI compared to other machine learning methods. To better understand the DRL, we compares and contrasts to other related methods: Deep Learning, Dynamic Programming and Game Theory. We apply one of state-of-art DRL algorithms, called Proximal Policy Op- timization (PPO) to the robot walkers locomotion, as a simple yet challenging environment, inherently continuous and high-dimensional state/action space. The end goal of this project is to train the agent by finding the optimal sequential actions (policy/strategy) of multi-walkers leading them to move forward as far as possible to maximize the accumulated reward (performance). This goal can be accomplished by finding the tuned hyperparameters of the PPO algorithm by monitoring the performances for the multi-agent DRL (MADRL) settings. At the end, we can draw three conclusions from our findings based on the various MADRL experiments: 1) Unlike DL with explicit target labels, DRL needs larger minibatch size for better estimate of values from various gradients. There- fore, a minibatch size and its pool size (experience replay buffer) are critical hyperparameters in PPO algorithm. 2) For the homogeneous multi-agent envi- ronments, there is a mutual transferability between single-agent and multi-agent environments to be able to reuse the tuned hyperparameters. 3) For the homo- geneous multi-agent environments with a well tuned hyperparameter set, the parameter sharing is a better strategy for the MADRL in terms of performance and efficiency with reduced parameters and less memory. To conclude, reward-driven, sequential and evaluative learning, the DRL, would be closer to AGI if multiple DRL agents learn to collaborate to capture the true signal from the shared environment. This work provides one instance of implicit cooperative learning of MADRL.
- Research Article
156
- 10.3390/s23073625
- Mar 30, 2023
- Sensors
Deep reinforcement learning has produced many success stories in recent years. Some example fields in which these successes have taken place include mathematics, games, health care, and robotics. In this paper, we are especially interested in multi-agent deep reinforcement learning, where multiple agents present in the environment not only learn from their own experiences but also from each other and its applications in multi-robot systems. In many real-world scenarios, one robot might not be enough to complete the given task on its own, and, therefore, we might need to deploy multiple robots who work together towards a common global objective of finishing the task. Although multi-agent deep reinforcement learning and its applications in multi-robot systems are of tremendous significance from theoretical and applied standpoints, the latest survey in this domain dates to 2004 albeit for traditional learning applications as deep reinforcement learning was not invented. We classify the reviewed papers in our survey primarily based on their multi-robot applications. Our survey also discusses a few challenges that the current research in this domain faces and provides a potential list of future applications involving multi-robot systems that can benefit from advances in multi-agent deep reinforcement learning.
- Conference Article
- 10.46254/gc02.20240072
- Dec 1, 2024
With the increasing real-time demand in the internet era, especially dynamic requests for last-mile delivery, route planning is becoming more computationally expensive than ever before. Many supply chains (SCs) choose the joint distribution of multiple depots to reduce transportation costs and delivery times. However, providing real-time and high-quality solutions for such complex routing problems remains challenging. Current solution methods like mathematical programming and heuristics suffer from scalability issues and long computation times. In contrast, artificial intelligence, especially Deep Reinforcement Learning (DRL), provides a general-purpose framework for sequential decision-making that has produced good results for many challenging real-life problems. However, applying DRL to route multiple vehicles is nontrivial, as the joint distribution requires an effective method that facilitates collaboration and communication among all agents while they carry out the delivery mission. In this research, a collaborative Multi-Agent Deep Reinforcement Learning (MADRL) approach is proposed for routing multiple vehicles in the SC. The proposed MADRL model leverages the power of two frameworks, deep learning and reinforcement learning, to generate routing policies for all agents in real time. Experimental results show the ability of the proposed learning model to obtain fast and quality solutions for complex delivery problems. Furthermore, the generalization ability of MADRL is also validated by testing the well-trained model on different scale problems.
- Research Article
1
- 10.1177/14780771241287827
- Nov 5, 2024
- International Journal of Architectural Computing
To address the unprecedented challenges of construction pressurized by the global climate crisis, housing shortage, and growing shortage of skilled labor, this research presents a radical shift in the construction lifecycle of buildings, from linear processes that produce static continuous buildings to interrelational processes linking adaptative eco-systems of collaborative robots and reconfigurable building parts. Inspired by natural builders, the interdisciplinary field of collective robotic construction (CRC) offers the potential for scalable, adaptive, and resilient construction with simple robots. We establish a design framework for autonomous collaborative robotic construction (ACRC) through modular robotic material eco-systems (MRMES) trained with deep multi-agent reinforcement learning (DMARL). This involves the integration of three core aspects: (1) modular robotic material eco-systems (2) cyber-physical simulation and control with bidirectional feedback (3) adaptive intelligence through deep multi-agent reinforcement learning. The framework is implemented through three comparable case studies for collaborative modular robotic assembly of reconfigurable building parts.
- Conference Article
6
- 10.1109/istt56288.2022.9966551
- Nov 14, 2022
Multi-agent Deep Reinforcement Learning (MADRL) has been applied to a plethora of state-of-the-art applications such as resource allocations and network routing in both centralized and distributed manners. This paper investigates the performance of centralized and distributed MADRL in Dynamic Spectrum Access (DSA). We consider a multichannel wireless network with a shared bandwidth divided into k channels. The objective of the MADRL is to develop a spectrum access strategy by learning in both a centralized and distributed manner. To evaluate the performance of centralized and distributed MADRL, we tackle the spectrum access problem by applying centralized MADRL and distributed MADRL. Experimental results show that distributed MADRL outperforms the centralized MADRL by 15% in collision avoidance and accumulated rewards in DSA.
- Research Article
7
- 10.1109/tcds.2023.3323987
- Apr 1, 2024
- IEEE Transactions on Cognitive and Developmental Systems
In recent years, cooperative multiagent deep reinforcement learning (MADRL) has received increasing research interest and has been widely applied to computer games and coordinated multirobot systems, etc. However, it is still challenging to realize high-solution quality and learning efficiency for MADRL under the conditions of incomplete and noisy observations. To this end, this article proposes an MADRL approach with grouped cognitive feature representation (GCEN), following the paradigm of centralized training and decentralized execution (CTDE). Different from previous works, GCEN incorporates a new cognitive feature representation that combines a grouped attention mechanism and a training approach using mutual information (MI). The grouped attention mechanism is proposed to selectively extract entity features within the observation field for each agent while avoiding the influence of irrelevant observations. The MI regularization term is designed to guide the agents to learn grouped cognitive features based on global information, aiming to mitigate the influence of partial observations. The proposed GCEN approach can be extended as a feature representation module to different MADRL methods. Extensive experiments on the challenging level-based foraging and StarCraft II micromanagement benchmarks were conducted to illustrate the effectiveness and advantages of the proposed approach. Compared with seven representative MADRL algorithms, our proposed approach achieves state-of-the-art performance in winning rates and training efficiency. Experimental results further demonstrate that GCEN has improved generalization ability across varying sight ranges.
- Research Article
34
- 10.1016/j.inffus.2022.08.001
- Aug 4, 2022
- Information Fusion
An inductive heterogeneous graph attention-based multi-agent deep graph infomax algorithm for adaptive traffic signal control
- Research Article
1
- 10.54097/hset.v39i.6655
- Apr 1, 2023
- Highlights in Science Engineering and Technology
One of the numerous multi-agents’ deep reinforcements learning methods and a hotspot for research in the field is multi-agent deep reinforcement learning based on value factorization. In order to effectively address the issues of environmental instability and the exponential expansion of action space in multi-agent systems, it uses some constraints to break down the joint action value function of the multi-agent system into a specific combination of individual action value functions. Firstly, in this paper, the reason for the factorization of value function is explained. The fundamentals of multi-agent deep reinforcement learning are then introduced. The multi-agent deep reinforcement learning algorithms based on value factorization may then be separated into simple factorization and attention-mechanism based algorithms depending on whether other mechanisms are incorporated and which various mechanisms are introduced. Then several typical algorithms are introduced and their advantages and disadvantages are compared and analyzed. Finally, the content of reinforcement learning elaborated in this paper is summarized.