Dynamic flexible job shop co-scheduling optimization based on graph neural network and deep reinforcement learning

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon

Dynamic flexible job shop co-scheduling optimization based on graph neural network and deep reinforcement learning

Similar Papers
  • Research Article
  • Cite Count Icon 61
  • 10.1016/j.tics.2020.09.002
Artificial Intelligence and the Common Sense of Animals.
  • Oct 8, 2020
  • Trends in Cognitive Sciences
  • Murray Shanahan + 3 more

Artificial Intelligence and the Common Sense of Animals.

  • PDF Download Icon
  • Book Chapter
  • Cite Count Icon 7
  • 10.5772/intechopen.111651
Graph Neural Networks and Reinforcement Learning: A Survey
  • Nov 15, 2023
  • Fatemeh Fathinezhad + 3 more

Graph neural network (GNN) is an emerging field of research that tries to generalize deep learning architectures to work with non-Euclidean data. Nowadays, combining deep reinforcement learning (DRL) with GNN for graph-structured problems, especially in multi-agent environments, is a powerful technique in modern deep learning. From the computational point of view, multi-agent environments are inherently complex, because future rewards depend on the joint actions of multiple agents. This chapter tries to examine different types of applying GNN and DRL techniques in the most common representations of multi-agent problems and their challenges. In general, the fusion of GNN and DRL can be addressed from two different points of view. First, GNN is used to influence the DRL performance and improve its formulation. Here, GNN is applied in relational DRL structures such as multi-agent and multi-task DRL. Second, DRL is used to improve the application of GNN. From this viewpoint, DRL can be used for a variety of purposes including neural architecture search and improving the explanatory power of GNN predictions.

  • Research Article
  • Cite Count Icon 57
  • 10.1016/j.dsp.2022.103419
A new ensemble deep graph reinforcement learning network for spatio-temporal traffic volume forecasting in a freeway network
  • Jan 29, 2022
  • Digital Signal Processing
  • Pan Shang + 5 more

A new ensemble deep graph reinforcement learning network for spatio-temporal traffic volume forecasting in a freeway network

  • Research Article
  • Cite Count Icon 90
  • 10.1016/j.cie.2023.109718
Dynamic scheduling for flexible job shop with insufficient transportation resources via graph neural network and deep reinforcement learning
  • Oct 31, 2023
  • Computers & Industrial Engineering
  • Min Zhang + 3 more

Dynamic scheduling for flexible job shop with insufficient transportation resources via graph neural network and deep reinforcement learning

  • Research Article
  • Cite Count Icon 2
  • 10.1109/access.2025.3526627
Multimedia Tasks-Oriented Edge Computing Offloading Scheme Based on Graph Neural Network in Vehicular Networks
  • Jan 1, 2025
  • IEEE Access
  • Yong Huang

With the advancement of vehicular networking technologies, in-vehicle devices are increasingly involved in complex computational tasks, posing new challenges to the vehicles’ computational capabilities and energy consumption. This study addresses the complexities associated with task deployment by proposing a novel integrated framework that synergizes the strengths of deep reinforcement learning (DRL) and graph neural networks (GNNs). The proposed framework leverages the relational capabilities of GNNs to capture inter-task dependencies while utilizing the adaptive learning capabilities of DRL to optimize task offloading decisions in real time. First, we propose a GNN-based task offloading scheme that utilizes graph structures to represent task dependencies and optimizes task deployment through graph neural networks. Second, deep Reinforcement Learning (DRL) is introduced to learn optimal task deployment policies, enhancing the efficiency and accuracy of task deployment. Finally, the performance of the proposed scheme is validated through simulation experiments, including metrics such as model stability, subtask deployment error rate, expected energy consumption, and algorithm solution time. The proposed hybrid approach not only enhances the efficiency of resource allocation but also minimizes processing delays, ultimately contributing to improved performance in vehicular networks.

  • Conference Article
  • Cite Count Icon 16
  • 10.1109/cbd54617.2021.00045
Applying Graph Neural Network in Deep Reinforcement Learning to Optimize Wireless Network Routing
  • Mar 1, 2022
  • Xiao Xu + 2 more

At present, the traffic in wireless sensor networks (WSN) is growing at an extremely fast speed, consuming more and more network resources. This undoubtedly affects the transmission performance of WSN. Good and efficient routing technology is one of the key technologies to solve this problem. Limited by the dynamic network state, traditional routing technology faces some problems such as performance degradation and lack of learning ability. In contrast, Deep Reinforcement Learning (DRL), which has the ability of decision-making and online learning, has a better effect in facing the routing optimization problem. DRL can learn routing strategy online or offline through reinforcement learning mechanism and deep neural network. However, the existing routing models based on DRL use fully connected neural networks or convolutional neural networks, and cannot learn the network topology information. This will lead to the failure of the previously trained routing model in the face of a new network. Therefore, under the background that WSN nodes may fail, resulting in topology changes, this paper combines Graph Neural Network (GNN) with DRL, and proposes GRL-NET intelligent routing algorithm. The algorithm uses GNN instead of conventional neural network to construct DRL Agent. With the help of GNN, GRL-NET can not only learn the complex relationship among network topology, traffic and routing from the perspective of network topology, but also run in a network topology that has never appeared before. In order to evaluate the effect of GRL-NET, several groups of experiments were conducted under different traffic intensity. Experimental results show that GRL-NET can not only learn the best routing strategy, but also keep good results in the never-seen network topology.

  • Research Article
  • Cite Count Icon 7
  • 10.1016/j.ifacol.2020.12.378
Resource Allocation in Large-Scale Wireless Control Systems with Graph Neural Networks
  • Jan 1, 2020
  • IFAC PapersOnLine
  • Vinicius Lima + 3 more

Resource Allocation in Large-Scale Wireless Control Systems with Graph Neural Networks

  • Research Article
  • Cite Count Icon 14
  • 10.1109/jiot.2024.3443866
TransEdge: Task Offloading With GNN and DRL in Edge-Computing-Enabled Transportation Systems
  • Dec 1, 2024
  • IEEE Internet of Things Journal
  • Aikun Xu + 11 more

In recent years, since edge computing has improved the performance of transportation systems, research on edge-computing-enabled transportation systems has received widespread attention. However, most previous studies overlooked that task requests in transportation systems are unevenly distributed in time and space, which easily causes the overloading of edge servers, resulting in high response latency. To this end, we present a novel task offloading scheme based on graph neural network (GNN) and deep reinforcement learning (DRL) in edge-computing-enabled transportation systems (TransEdge). Specifically, we first propose an adaptive node placement algorithm to assign Internet of Things sensors to appropriate edge servers, thereby minimizing transmission latency. Then, an improved DRL scheme based on GNN is designed to capture the spatial features between sensors, aiming to improve the accuracy of task offloading decisions. Finally, we introduce a task forwarding strategy based on the greedy algorithm to achieve collaborative task offloading between different edge servers and overcome the system instability caused by a sudden surge in task requests. We conduct extensive experiments on two real-world traffic data sets. The results show that TransEdge reduces the response latency by at least 3.7% compared to four baselines while achieving a success rate of 99%.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 6
  • 10.3390/aerospace11070511
A Graph Reinforcement Learning-Based Handover Strategy for Low Earth Orbit Satellites under Power Grid Scenarios
  • Jun 24, 2024
  • Aerospace
  • Haizhi Yu + 2 more

Amidst the escalating need for stable power supplies and high-quality communication services in remote regions globally, due to challenges associated with deploying a conventional power communication infrastructure and its susceptibility to natural disasters, LEO satellite networks present a promising solution for broad geographical coverage and the provision of stable and high-speed communication services in remote regions. Given the necessity for frequent handovers to maintain service continuity, due to the high mobility of LEO satellites, a primary technical challenge confronting LEO satellite networks lies in efficiently managing the handover process between satellites, to guarantee the continuity and quality of communication services, particularly for power services. Thus, there is a critical need to explore satellite handover optimization algorithms. This paper presents a handover optimization scheme that integrates deep reinforcement learning (DRL) and graph neural networks (GNN) to dynamically optimize the satellite handover process and adapt to the time-varying satellite network environment. DRL models can effectively detect changes in the topology of satellite handover graphs across different time periods by leveraging the powerful representational capabilities of GNNs to make optimal handover decisions. Simulation experiments confirm that the handover strategy based on the fusion of message-passing neural network and deep Q-network algorithm (MPNN-DQN) outperforms traditional handover mechanisms and DRL-based strategies in reducing handover frequency, lowering communication latency, and achieving network load balancing. Integrating DRL and GNN into the satellite handover mechanism enhances the communication continuity and reliability of power systems in remote areas, while also offering a new direction for the design and optimization of future power system communication networks. This research contributes to the advancement of sophisticated satellite communication architectures that facilitate high-speed and reliable internet access in remote regions worldwide.

  • Conference Article
  • 10.18690/um.fov.2.2025.14
Graph Neural Networks and Deep Reinforcement Learning in Warehouse Order Picking and Batching - Literature Review
  • Mar 19, 2025
  • Nejc Čelik + 1 more

This paper is a systematic literature review on use of the Deep Reinforcement Learning (DRL) and Graph Neural Networks (GNN) in warehouse. We first explore the use of DRL and GNN for optimization of order picking and batching in warehouse. Because of very little results on use of GNNs in optimization of order picking and batching we extended our search to general use of GNNs in warehouse environment. We identified different topics of research using Latent Dirichlet Allocation (LDA) and identified main problems in use of DRL and GNNs in warehouse environment.

  • Research Article
  • Cite Count Icon 6
  • 10.1360/n972016-00741
Break through the limits of learning by machines
  • Sep 20, 2016
  • Chinese Science Bulletin
  • Zhongzhi Shi

Learning ability is the basic characteristic of human intelligence. The July 1, 2005 issue of Science published a list of 125 important questions in science. Among them, the question 94 “What are the limits of learning by machines?”. The annotation “Computers can already beat the world’s best chess players, and they have a wealth of information on the Web to draw on. But abstract reasoning is still beyond any machine”. In recent artificial intelligence has made great progresses. In 1997, the rise of the man-machine war, IBM Supercomputer Deep Blue defeated the chess master Garry Kasparov. On February 14, 2011, IBM’s Watson supercomputer won a practice round against Jeopardy champions Ken Jennings and Brad Rutter. In March 2016, Google DeepMind’s AlphaGo sealed a 4-1 victory over a South Korean Go grandmaster Lee Se-dol. This paper focuses on the machine learning methods of AlphaGo, including reinforcement learning, deep learning, deep reinforcement learning, analysis of the existing problems and the latest research progress. Deep reinforcement learning is the combination of deep learning and reinforcement learning, which can realize the learning algorithm from the perception to action. Simply said, this is the same as human behavior, input sensing information such as vision, and then, direct output action through the deep neural network. Deep reinforcement learning has the potential to learn a variety of skills for the robot to achieve full autonomy. Even though reinforcement learning is practiced successfully, but feature states need to manually set, for complex scene is a difficult thing, especially easy to cause the dimension disaster, and expression is not good. In 2010, Sascha Lange and Martin Riedmiller proposed deep auto-encoder neural networks in reinforcement learning to extract feature, which is used to control the visual correlation. In 2013, DeepMind proposed deep Q-network (DQN) in NIPS 2013, using convolution neural network to extract features, and then applied in reinforcement learning. They continue to improve and published an improved version of DQN on Nature in 2015, which has aroused widespread concern. In order to break through the limits of learning by machines, cognitive machine learning is proposed, which is the combination of machine learning and brain cognition, so that the machine intelligence is constantly evolving, and gradually reaches the human level of artificial intelligence. A cognitive model entitled Consciousness And Memory (CAM) is proposed by author, which consists of memory, consciousness, high-level cognitive functions, perception and motor. High-level cognitive functions of the brain include learning, language, thinking, decision making, emotion, and so on. Learning is a course to accept the stimulus through the nervous system and obtain new behavior, habits and accumulation experience. According to the current research progress of brain science and cognitive science, cognitive machine learning may be interested in learning emergence, procedural memory knowledge learning, learning evolution and so on. For intelligence, so-called evolution is refers to the learning of learning and the structure also follows the change. It is important to record the learning result by structure changing and improve the learning method.

  • Research Article
  • 10.31449/inf.v48i22.6943
Automatic Network Traffic Scheduling Algorithm Based on Deep Reinforcement Learning
  • Dec 6, 2024
  • Informatica
  • Huiling He

This paper proposes an intelligent network traffic scheduling algorithm based on deep reinforcement learning and graph neural network (GNN) to solve traffic scheduling problems in large-scale dynamic network environments. The algorithm combines the decision-making ability of deep reinforcement learning and the advantage of GNNs in processing graph structure data. Through hierarchical reinforcement learning framework, it realizes efficient decision-making process from macro-strategy formulation to micro-operation execution. Experimental results show that compared with traditional algorithms, the proposed algorithm has significant advantages in key performance indicators such as average delay time, throughput and resource utilization. The algorithm not only surpasses Dijkstra, Shortest Path First (SPF) and Weighted Round Robin (WRR) algorithms under standard test conditions, but also shows excellent robustness and generalization ability under complex scenarios such as different traffic demand intensity, link failure and network topology change. In addition, through model optimization and parameter adjustment, the convergence speed and learning efficiency of the algorithm are significantly improved when dealing with large-scale networks, which provides strong technical support for automatic network traffic management.翻译搜索复制

  • Research Article
  • Cite Count Icon 1
  • 10.4233/uuid:f8faacb0-9a55-453d-97fd-0388a3c848ee
Sample effficient deep reinforcement learning for control
  • Dec 15, 2019
  • Research Repository (Delft University of Technology)
  • Tim De Bruin

The arrival of intelligent, general-purpose robots that can learn to perform new tasks autonomously has been promised for a long time now. Deep reinforcement learning, which combines reinforcement learning with deep neural network function approximation, has the potential to enable robots to learn to perform a wide range of new tasks while requiring very little prior knowledge or human help. This framework might therefore help to finally make general purpose robots a reality. However, the biggest successes of deep reinforcement learning have so far been in simulated game settings. To translate these successes to the real world, significant improvements are needed in the ability of these methods to learn quickly and safely. This thesis investigates what is needed to make this possible and makes contributions towards this goal. <br/><br/>Before deep reinforcement learning methods can be successfully applied in the robotics domain, an understanding is needed of how, when, and why deep learning and reinforcement learning work well together. This thesis therefore starts with a literature review, which is presented in Chapter 2. While the field is still in some regards in its infancy, it can already be noted that there are important components that are shared by successful algorithms. These components help to reconcile the differences between classical reinforcement learning methods and the training procedures used to successfully train deep neural networks. The main challenges in combining deep learning with reinforcement learning center around the interdependencies of the policy, the training data, and the training targets. Commonly used tools for managing the detrimental effects caused by these interdependencies include target networks, trust region updates, and experience replay buffers. Besides reviewing these components, a number of the more popular and historically relevant deep reinforcement learning methods are discussed.<br/><br/>Reinforcement learning involves learning through trial and error. However, robots (and their surroundings) are fragile, which makes these trials---and especially errors---very costly. Therefore, the amount of exploration that is performed will often need to be drastically reduced over time, especially once a reasonable behavior has already been found. We demonstrate how, using common experience replay techniques, this can quickly lead to forgetting previously learned successful behaviors. This problem is investigated in Chapter 3. Experiments are conducted to investigate what distribution of the experiences over the state-action space leads to desirable learning behavior and what distributions can cause problems. It is shown how actor-critic algorithms are especially sensitive to the lack of diversity in the action space that can result form reducing the amount of exploration over time. Further relations between the properties of the control problem at hand and the required data distributions are also shown. These include a larger need for diversity in the action space when control frequencies are high and a reduced importance of data diversity for problems where generalizing the control strategy across the state-space is more difficult.<br/><br/>While Chapter 3 investigates what data distributions are most beneficial, Chapter 4 instead proposes practical algorithms to {select} useful experiences from a stream of experiences. We do not assume to have any control over the stream of experiences, which makes it possible to learn from additional sources of experience like other robots, experiences obtained while learning different tasks, and experiences obtained using predefined controllers. We make two separate judgments on the utility of individual experiences. The first judgment is on the long term utility of experiences, which is used to determine which experiences to keep in memory once the experience buffer is full. The second judgment is on the instantaneous utility of the experience to the learning agent. This judgment is used to determine which experiences should be sampled from the buffer to be learned from. To estimate the short and long term utility of the experiences we propose proxies based on the age, surprise, and the exploration intensity associated with the experiences. It is shown how prior knowledge of the control problem at hand can be used to decide which proxies to use. We additionally show how the knowledge of the control problem can be used to estimate the optimal size of the experience buffer and whether or not to use importance sampling to compensate for the bias introduced by the selection procedure. Together, these choices can lead to a more stable learning procedure and better performing controllers. <br/><br/>In Chapter 5 we look at what to learn form the collected data. The high price of data in the robotics domain makes it crucial to extract as much knowledge as possible from each and every datum. Reinforcement learning, by default, does not do so. We therefore supplement reinforcement learning with explicit state representation learning objectives. These objectives are based on the assumption that the neural network controller that is to be learned can be seen as consisting of two consecutive parts. The first part (referred to as the state encoder) maps the observed sensor data to a compact and concise representation of the state of the robot and its environment. The second part determines which actions to take based on this state representation. As the representation of the state of the world is useful for more than just completing the task at hand, it can also be trained with more general (state representation learning) objectives than just the reinforcement learning objective associated with the current task. We show how including these additional training objectives allows for learning a much more general state representation, which in turn makes it possible to learn broadly applicable control strategies more quickly. We also introduce a training method that ensures that the added learning objectives further the goal of reinforcement learning, without destabilizing the learning process through their changes to the state encoder. <br/><br/>The final contribution of this thesis, presented in Chapter 6, focuses on the optimization procedure used to train the second part of the policy; the mapping from the state representation to the actions. While we show that the state encoder can be efficiently trained with standard gradient-based optimization techniques, perfecting this second mapping is more difficult. Obtaining high quality estimates of the gradients of the policy performance with respect to the parameters of this part of the neural network is usually not feasible. This means that while a reasonable policy can be obtained relatively quickly using gradient-based optimization approaches, this speed comes at the cost of the stability of the learning process as well as the final performance of the controller. Additionally, the unstable nature of this learning process brings with it an extreme sensitivity to the values of the hyper-parameters of the training method. This places an unfortunate emphasis on hyper-parameter tuning for getting deep reinforcement learning algorithms to work well. Gradient-free optimization algorithms can be more simple and stable, but tend to be much less sample efficient. We show how the desirable aspects of both methods can be combined by first training the entire network through gradient-based optimization and subsequently fine-tuning the final part of the network in a gradient-free manner. We demonstrate how this enables the policy to improve in a stable manner to a performance level not obtained by gradient-based optimization alone, using many fewer trials than methods using only gradient-free optimization.<br/>

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 33
  • 10.3390/pr11051571
Combining Reinforcement Learning Algorithms with Graph Neural Networks to Solve Dynamic Job Shop Scheduling Problems
  • May 21, 2023
  • Processes
  • Zhong Yang + 2 more

Smart factories have attracted a lot of attention from scholars for intelligent scheduling problems due to the complexity and dynamics of their production processes. The dynamic job shop scheduling problem (DJSP), as one of the intelligent scheduling problems, aims to make an optimized scheduling decision sequence based on the real-time dynamic job shop environment. The traditional reinforcement learning (RL) method converts the scheduling problem with a Markov process and combines its own reward method to obtain scheduling sequences in different real-time shop states. However, the definition of shop states often relies on the scheduling experience of the model constructor, which undoubtedly affects the optimization capability of the reinforcement learning model. In this paper, we combine graph neural network (GNN) and deep reinforcement learning (DRL) algorithm to solve DJSP. An agent model from job shop state analysis graph to scheduling rules is constructed, thus avoiding the problem that traditional reinforcement learning methods rely on scheduling experience to artificially set the state feature vectors. In addition, a new reward function is defined, and the experimental results prove that our proposed reward method is more effective. The effectiveness and feasibility of our model is demonstrated by comparing with general deep reinforcement learning algorithms on minimizing the earlier and later completion time, which also lays the foundation for solving the DJSP later.

  • Research Article
  • Cite Count Icon 229
  • 10.1007/s10462-021-10061-9
Deep reinforcement learning in computer vision: a comprehensive survey
  • Sep 29, 2021
  • Artificial Intelligence Review
  • Ngan Le + 4 more

Deep reinforcement learning augments the reinforcement learning framework and utilizes the powerful representation of deep neural networks. Recent works have demonstrated the remarkable successes of deep reinforcement learning in various domains including finance, medicine, healthcare, video games, robotics, and computer vision. In this work, we provide a detailed review of recent and state-of-the-art research advances of deep reinforcement learning in computer vision. We start with comprehending the theories of deep learning, reinforcement learning, and deep reinforcement learning. We then propose a categorization of deep reinforcement learning methodologies and discuss their advantages and limitations. In particular, we divide deep reinforcement learning into seven main categories according to their applications in computer vision, i.e. (i) landmark localization (ii) object detection; (iii) object tracking; (iv) registration on both 2D image and 3D image volumetric data (v) image segmentation; (vi) videos analysis; and (vii) other applications. Each of these categories is further analyzed with reinforcement learning techniques, network design, and performance. Moreover, we provide a comprehensive analysis of the existing publicly available datasets and examine source code availability. Finally, we present some open issues and discuss future research directions on deep reinforcement learning in computer vision.

Save Icon
Up Arrow
Open/Close
Notes

Save Important notes in documents

Highlight text to save as a note, or write notes directly

You can also access these Documents in Paperpal, our AI writing tool

Powered by our AI Writing Assistant