Invalid Actions Research Articles

Connected and Automated Vehicles (CAV) are regarded as the developing trend of future transportation due to the advantages in terms of increasing traffic throughput, safety, and reducing energy consumption. One of the challenging CAV research problems is intelligent collaborative driving decision in dynamically changing traffic scenarios. The sixth-generation mobile communication technology (6G) plays an important role in attaining CAV intelligent driving in Internet of Vehicles (IoV) by provide significantly higher system performance, higher spectral efficiency, higher reliability, and higher security. This paper formulates the collaborative decision problem of CAVs at unsignalized intersections as a multi-agent reinforcement learning (MARL) problem, where CAVs entering an intersection safely cross the intersection and minimize their traffic time through collaborating to learn a strategy. An efficient and scalable MARL algorithm based on Proximal Policy Optimization (PPO) is applied to the dynamic intersection scenarios, where the parameter-sharing mechanism is used to improve the PPO algorithm to be a multi-agent version. Instead of the global reward, the local reward is used to promote cooperation between agents and achieve scalability. In addition, the action masking mechanism is adopted to improve the learning efficiency by filtering out invalid actions at each step. A joint simulation platform is built to verify the performance of the Parameter-Sharing PPO algorithm (PS-PPO). The simulation results show that the PS-PPO algorithm not only guarantees a low collision rate but also greatly reduces the vehicular travel time. This algorithm promotes cooperation among CAVs and performs well in the task of collaborative traffic flow at unsignalized intersections, especially in high-traffic volume scenarios. It is helpful to improve the safety and efficiency of traffic flow at unsignalized intersections.

Read full abstract

We propose a novel application of reinforcement learning (RL) with invalid action masking and a novel training methodology for routing and wavelength assignment (RWA) in fixed-grid optical networks and demonstrate the generalizability of the learned policy to a realistic traffic matrix unseen during training. Through the introduction of invalid action masking and a new training method, the applicability of RL to RWA in fixed-grid networks is extended from considering connection requests between nodes to servicing demands of a given bit rate, such that lightpaths can be used to service multiple demands subject to capacity constraints. We outline the additional challenges involved for this RWA problem, for which we found that standard RL had low performance compared to that of baseline heuristics, in comparison with the connection requests RWA problem considered in the literature. Thus, we propose invalid action masking and a novel training method to improve the efficacy of the RL agent. With invalid action masking, domain knowledge is embedded in the RL model to constrain the action space of the RL agent to lightpaths that can support the current request, reducing the size of the action space and thus increasing the efficacy of the agent. In the proposed training method, the RL model is trained on a simplified version of the problem and evaluated on the target RWA problem, increasing the efficacy of the agent compared with training directly on the target problem. RL with invalid action masking and this training method outperforms standard RL and three state-of-the-art heuristics, namely, k shortest path first fit, first-fit k shortest path, and k shortest path most utilized, consistently across uniform and nonuniform traffic in terms of the number of accepted transmission requests for two real-world core topologies, NSFNET and COST–239. The RWA runtime of the proposed RL model is comparable to that of these heuristic approaches, demonstrating the potential for real-world applicability. Moreover, we show that the RL agent trained on uniform traffic is able to generalize well to a realistic nonuniform traffic distribution not seen during training, thus outperforming the heuristics for this traffic. Visualization of the learned RWA policy reveals an RWA strategy that differs significantly from those of the heuristic baselines in terms of the distribution of services across channels and the distribution across links.

Read full abstract

Invalid Actions Research Articles

Articles published on Invalid Actions

Optimizing intelligent penetration path planning using reinforcement learning: A focus on valid action masking and sample enhancement

The unmanned vehicle on-ramp merging model based on AM-MAPPO algorithm

A multi-agent reinforcement learning method for distribution system restoration considering dynamic network reconfiguration

A Self-Attention-Based Deep Reinforcement Learning Approach for AGV Dispatching Systems.

Deep-learning based autonomous-exploration for UAV navigation

System-of-systems approach to spatio-temporal crowdsourcing design using improved PPO algorithm based on an invalid action masking

Data-driven dynamic pricing and inventory management of an omni-channel retailer in an uncertain demand environment

Model-Based Predictive Control and Reinforcement Learning for Planning Vehicle-Parking Trajectories for Vertical Parking Spaces.

Exploring the Use of Invalid Action Masking in Reinforcement Learning: A Comparative Study of On-Policy and Off-Policy Algorithms in Real-Time Strategy Games

EMExplorer: an episodic memory enhanced autonomous exploration strategy with Voronoi domain conversion and invalid action masking

Solving job shop scheduling problems via deep reinforcement learning

A Multi-Agent Deep Reinforcement Learning-Based Popular Content Distribution Scheme in Vehicular Networks.

Cooperative Data Collection With Multiple UAVs for Information Freshness in the Internet of Things

A reinforcement learning-based approach for online bus scheduling

Vehicular intelligent collaborative intersection driving decision algorithm in Internet of Vehicles

TD3-Based EMS Using Action Mask and Considering Battery Aging for Hybrid Electric Dump Trucks

A Deep Reinforcement Learning Approach for Optimal Scheduling of Heavy-haul Railway

A Novel Hybrid-ARPPO Algorithm for Dynamic Computation Offloading in Edge Computing

Global path planning algorithm based on double DQN for multi-tasks amphibious unmanned surface vehicle

Techniques for applying reinforcement learning to routing and wavelength assignment problems in optical fiber communication networks

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Invalid Actions Research Articles

Articles published on Invalid Actions

Optimizing intelligent penetration path planning using reinforcement learning: A focus on valid action masking and sample enhancement

The unmanned vehicle on-ramp merging model based on AM-MAPPO algorithm

A multi-agent reinforcement learning method for distribution system restoration considering dynamic network reconfiguration

A Self-Attention-Based Deep Reinforcement Learning Approach for AGV Dispatching Systems.

Deep-learning based autonomous-exploration for UAV navigation

System-of-systems approach to spatio-temporal crowdsourcing design using improved PPO algorithm based on an invalid action masking

Data-driven dynamic pricing and inventory management of an omni-channel retailer in an uncertain demand environment

Model-Based Predictive Control and Reinforcement Learning for Planning Vehicle-Parking Trajectories for Vertical Parking Spaces.

Exploring the Use of Invalid Action Masking in Reinforcement Learning: A Comparative Study of On-Policy and Off-Policy Algorithms in Real-Time Strategy Games

EMExplorer: an episodic memory enhanced autonomous exploration strategy with Voronoi domain conversion and invalid action masking

Solving job shop scheduling problems via deep reinforcement learning

A Multi-Agent Deep Reinforcement Learning-Based Popular Content Distribution Scheme in Vehicular Networks.

Cooperative Data Collection With Multiple UAVs for Information Freshness in the Internet of Things

A reinforcement learning-based approach for online bus scheduling

Vehicular intelligent collaborative intersection driving decision algorithm in Internet of Vehicles

TD3-Based EMS Using Action Mask and Considering Battery Aging for Hybrid Electric Dump Trucks

A Deep Reinforcement Learning Approach for Optimal Scheduling of Heavy-haul Railway

A Novel Hybrid-ARPPO Algorithm for Dynamic Computation Offloading in Edge Computing

Global path planning algorithm based on double DQN for multi-tasks amphibious unmanned surface vehicle

Techniques for applying reinforcement learning to routing and wavelength assignment problems in optical fiber communication networks