Policy Learning Research Articles

Path planning is a key process in robotics, playing an important role in fields such as autonomous driving and logistic delivery. Our work addresses the dual challenges of training efficiency and composite optimization in path planning using Deep Reinforcement Learning (DRL). We introduce the Efficient Progressive Policy Enhancement (EPPE) framework, which integrates the advantages of sparse rewards, aimed at achieving a globally optimal policy for the agent, with process rewards that provide real-time feedback for the agent’s policy adjustment. This framework not only significantly enhances policy learning efficiency but also effectively resolves the reward coupling issues introduced by process rewards, thereby ensuring the achievement of a globally optimal policy. Within this framework, the initial reward structure incorporates guiding rewards, which are a type of process reward based on conventional path planning algorithms, and assigns significant weights to provide real-time feedback, thereby effectively enhancing the training efficiency. Additionally, the Incremental Reward Adjustment (IRA) model is proposed to progressively increase the reward weights in the composite optimization part. The Fine-tuning Policy Optimization (FPO) model, supporting the IRA model, makes gradual adjustments to the learning rate throughout the entire process. Simulated experiments demonstrate the advantage of our framework in path composite optimization. In static obstacle environments, compared to seven benchmark algorithms, the time and distance to reach the target are improved by at least 10.4%. In mixed obstacle environments, these improvements are at least 19.1% and 18.2%. Additionally, our framework also significantly enhances the training efficiency of DRL.

Reinforcement learning-based methods have addressed production scheduling problems with flexible processing constraints. However, delayed rewards arise due to the dynamic arrival of jobs and transportation constraints between two successive operations. The flow time of operations can only be determined after processing due to the possibility that the solution for job sequencing may change if new operations are inserted in dynamic environments. Job sequencing is often overlooked in single-agent-based scheduling methods. The lack of information sharing between multiple agents necessitates that researchers manually design reward functions to fit the relationship between optimization objectives and rewards, thereby reducing the accuracy of the learned policies. Thus, this paper proposes a multi-agent-based scheduling optimization framework that facilitates collaboration between the agents of both machines and jobs to address dynamic flexible job-shop scheduling problems (DFJSP) with transportation time constraints. Then, this paper formulates the Partial Observation Markov Decision Process and constructs a reward-sharing mechanism to tackle the delayed reward issue and facilitate policy learning. Finally, we develop an improved multi-agent dueling double deep Q network algorithm to optimize scheduling policy during long-term training. The results show that, compared with the state-of-the-art methods, the proposed method efficiently shortens the weighted flow time under the trained and unseen scenarios. Additionally, the case study results demonstrate its efficiency and responsiveness. It indicates that the proposed method efficiently addresses production scheduling problems with complex constraints, including the insertion of jobs, transportation time constraints, and flexible processing routes.

Policy Learning Research Articles

Related Topics

Articles published on Policy Learning

Beyond powering and puzzling: the political dimensions of policy learning

Fill-and-Spill: Deep Reinforcement Learning Policy Gradient Methods for Reservoir Operation Decision and Control

Preparing Artists to Save the World: Community-Engaged Arts Practice as Critical Pedagogy

Distributed Cooperative Quantum Learning for Discrete-Time Multiagent Source Exploration With Information Prompts.

Analysis of the structural relationship emotional regulation, academic procrastination, and academic burnout

A Self-Attention-Based Deep Reinforcement Learning Approach for AGV Dispatching Systems.

An efficient and lightweight off-policy actor–critic reinforcement learning framework

Explaining differences in policy learning in the EU "Fit for 55” climate policy package

EPPE: An Efficient Progressive Policy Enhancement framework of deep reinforcement learning in path planning

Dynamic flexible scheduling with transportation constraints by multi-agent reinforcement learning

Joint learning of reward machines and policies in environments with partially known semantics

Adaptive Gait Acquisition through Learning Dynamic Stimulus Instinct of Bipedal Robot.

Advancing collaborative social outcomes through place-based solutions—aligning policy and funding systems

Data-driven adaptive consensus control for heterogeneous nonlinear Multi-Agent Systems using online reinforcement learning

Automatic evolutionary design of quantum rule-based systems and applications to quantum reinforcement learning

Data-Driven Policy Learning Methods from Biological Behavior: A Systematic Review

Country typology reflecting (sub)national conditions for housing policy implementation

Learning optimal biomarker-guided treatment policy for chronic disorders.

Multimodality Driven Impedance-Based Sim2Real Transfer Learning for Robotic Multiple Peg-in-Hole Assembly.

Imitation Learning: Progress, Taxonomies and Challenges.

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Policy Learning Research Articles

Related Topics

Articles published on Policy Learning

Beyond powering and puzzling: the political dimensions of policy learning

Fill-and-Spill: Deep Reinforcement Learning Policy Gradient Methods for Reservoir Operation Decision and Control

Preparing Artists to Save the World: Community-Engaged Arts Practice as Critical Pedagogy

Distributed Cooperative Quantum Learning for Discrete-Time Multiagent Source Exploration With Information Prompts.

Analysis of the structural relationship emotional regulation, academic procrastination, and academic burnout

A Self-Attention-Based Deep Reinforcement Learning Approach for AGV Dispatching Systems.

An efficient and lightweight off-policy actor–critic reinforcement learning framework

Explaining differences in policy learning in the EU "Fit for 55” climate policy package

EPPE: An Efficient Progressive Policy Enhancement framework of deep reinforcement learning in path planning

Dynamic flexible scheduling with transportation constraints by multi-agent reinforcement learning

Joint learning of reward machines and policies in environments with partially known semantics

Adaptive Gait Acquisition through Learning Dynamic Stimulus Instinct of Bipedal Robot.

Advancing collaborative social outcomes through place-based solutions—aligning policy and funding systems

Data-driven adaptive consensus control for heterogeneous nonlinear Multi-Agent Systems using online reinforcement learning

Automatic evolutionary design of quantum rule-based systems and applications to quantum reinforcement learning

Data-Driven Policy Learning Methods from Biological Behavior: A Systematic Review

Country typology reflecting (sub)national conditions for housing policy implementation

Learning optimal biomarker-guided treatment policy for chronic disorders.

Multimodality Driven Impedance-Based Sim2Real Transfer Learning for Robotic Multiple Peg-in-Hole Assembly.

Imitation Learning: Progress, Taxonomies and Challenges.