Optimizing dynamic cellular manufacturing system: a deep reinforcement learning approach to profit maximization and inventory management
ABSTRACT This paper presents a novel dynamic cellular manufacturing system that incorporates order rejection, tardiness costs, and the costs of purchasing and holding raw materials. Orders arrive at different times, requiring real-time decisions on acceptance or rejection. Accepted orders necessitate raw materials, which must be procured at the optimal time. A mathematical model with two objectives—maximizing profit and minimizing the number of rejected orders—is proposed. To solve this problem, an iteration-based hierarchical solution method with three steps has been developed. First, machines are assigned to cells using a genetic algorithm. Next, a deep reinforcement learning (DRL) algorithm with a dual network is employed to manage order acceptance, schedule operations for accepted orders, assign them to the most suitable machines, procure raw materials, and determine optimal safety stock levels. The inventory management within the DRL framework is further supported by an artificial neural network. Finally, a boxing match algorithm is introduced to optimize machine placement based on DRL outputs. A case study was conducted to evaluate the performance of the proposed method, and comparative results between real-world data and the algorithm’s results demonstrate the effectiveness of the proposed approach.
- Research Article
2
- 10.1016/j.engappai.2024.108925
- Jul 17, 2024
- Engineering Applications of Artificial Intelligence
Harnessing deep reinforcement learning algorithms for image categorization: A multi algorithm approach
- Research Article
12
- 10.1002/dac.5411
- Dec 27, 2022
- International Journal of Communication Systems
SummaryFog computing has already started to gain a lot of momentum in the industry for its ability to turn scattered computing resources into a large‐scale, virtualized, and elastic computing environment. Resource management (RM) is one of the key challenges in fog computing which is also related to the success of fog computing. Deep learning has been applied to the fog computing field for some time, and it is widely used in large‐scale network RM. Reinforcement learning (RL) is a type of machine learning algorithms, and it can be used to learn and make decisions based on reward signals that are obtained from interactions with the environment. We examine current research in this area, comparing RL and deep reinforcement learning (DRL) approaches with traditional algorithmic methods such as graph theory, heuristics, and greedy for managing resources in fog computing environments (published between 2013 and 2022) illustrating how RL and DRL algorithms can be more effective than conventional techniques. Various algorithms based on DRL has been shown to be applicable to RM problem and proved that it has a lot of potential in fog computing. A new microservice model based on the DRL framework is proposed to achieve the goal of efficient fog computing RM. The positive impact of this work is that it can successfully provide a resource manager to efficiently schedule resources and maximize the overall performance.
- Research Article
18
- 10.1088/1361-6560/ac9cb3
- Nov 11, 2022
- Physics in Medicine & Biology
Reinforcement learning takes sequential decision-making approaches by learning the policy through trial and error based on interaction with the environment. Combining deep learning and reinforcement learning can empower the agent to learn the interactions and the distribution of rewards from state-action pairs to achieve effective and efficient solutions in more complex and dynamic environments. Deep reinforcement learning (DRL) has demonstrated astonishing performance in surpassing the human-level performance in the game domain and many other simulated environments. This paper introduces the basics of reinforcement learning and reviews various categories of DRL algorithms and DRL models developed for medical image analysis and radiation treatment planning optimization. We will also discuss the current challenges of DRL and approaches proposed to make DRL more generalizable and robust in a real-world environment. DRL algorithms, by fostering the designs of the reward function, agents interactions and environment models, can resolve the challenges from scarce and heterogeneous annotated medical image data, which has been a major obstacle to implementing deep learning models in the clinic. DRL is an active research area with enormous potential to improve deep learning applications in medical imaging and radiation therapy planning.
- Research Article
- 10.3390/en17246454
- Dec 21, 2024
- Energies
Aiming at the load fluctuation problem caused by a high proportion of new energy grid connections, a reactive power optimization method based on deep reinforcement learning (DRL) considering topological characteristics is proposed. The proposed method transforms the reactive power optimization problem into a Markov decision process and models and solves it through the deep reinforcement learning framework. The Dueling Double Deep Q-Network (D3QN) algorithm is adopted to improve the accuracy and efficiency of calculation. Aiming at the problem that deep reinforcement learning algorithms are difficult to simulate the topological characteristics of power flow, the Graph Convolutional Dueling Double Deep Q-Network (GCD3QN) algorithm is proposed. The graph convolutional neural network (GCN) is integrated into the D3QN model, and the information aggregation of topological nodes is realized through the graph convolution operator, which solves the calculation problem of deep learning algorithms in non-European space and improves the accuracy of reactive power optimization. The IEEE standard node system is used for simulation analysis, and the effectiveness of the proposed method is verified.
- Research Article
1
- 10.3390/math13050754
- Feb 25, 2025
- Mathematics
The development of artificial intelligence (AI) game agents that use deep reinforcement learning (DRL) algorithms to process visual information for decision-making has emerged as a key research focus in both academia and industry. However, previous game agents have struggled to execute multiple commands simultaneously in a single decision, failing to accurately replicate the complex control patterns that characterize human gameplay. In this paper, we utilize the ViZDoom environment as the DRL research platform and transform the agent–environment interactions into a Partially Observable Markov Decision Process (POMDP). We introduce an advanced multi-agent deep reinforcement learning (DRL) framework, specifically a Multi-Agent Proximal Policy Optimization (MA-PPO), designed to optimize target acquisition while operating within defined ammunition and time constraints. In MA-PPO, each agent handles distinct parallel tasks with custom reward functions for performance evaluation. The agents make independent decisions while simultaneously executing multiple commands to mimic human-like gameplay behavior. Our evaluation compares MA-PPO against other DRL algorithms, showing a 30.67% performance improvement over the baseline algorithm.
- Research Article
4
- 10.1007/s10898-024-01364-6
- Feb 15, 2024
- Journal of Global Optimization
In this paper, we address the difficulty of solving large-scale multi-dimensional knapsack instances (MKP), presenting a novel deep reinforcement learning (DRL) framework. In this DRL framework, we train different agents compatible with a discrete action space for sequential decision-making while still satisfying any resource constraint of the MKP. This novel framework incorporates the decision variable values in the 2D DRL where the agent is responsible for assigning a value of 1 or 0 to each of the variables. To the best of our knowledge, this is the first DRL model of its kind in which a 2D environment is formulated, and an element of the DRL solution matrix represents an item of the MKP. Our framework is configured to solve MKP instances of different dimensions and distributions. We propose a K-means approach to obtain an initial feasible solution that is used to train the DRL agent. We train four different agents in our framework and present the results comparing each of them with the CPLEX commercial solver. The results show that our agents can learn and generalize over instances with different sizes and distributions. Our DRL framework shows that it can solve medium-sized instances at least 45 times faster in CPU solution time and at least 10 times faster for large instances, with a maximum solution gap of 0.28% compared to the performance of CPLEX. Furthermore, at least 95% of the items are predicted in line with the CPLEX solution. Computations with DRL also provide a better optimality gap with respect to state-of-the-art approaches.
- Research Article
8
- 10.1109/tai.2022.3186292
- Jun 1, 2023
- IEEE Transactions on Artificial Intelligence
Dynamic pricing problem is difficult due to the highly dynamic environment and unknown demand distributions. In this paper, we propose a Deep Reinforcement Learning (DRL) framework, which is a pipeline that automatically defines the DRL components for solving a Dynamic Pricing problem. The automated DRL pipeline is necessary because the DRL framework can be designed in numerous ways, and manually finding optimal configurations is tedious. The levels of automation make non-experts capable of using DRL for dynamic pricing. Our DRL pipeline contains three steps of DRL design, including MDP modeling, algorithm selection, and hyper-parameter optimization. It starts with transforming available information to state representation and defining reward function using a reward shaping approach. Then, the hyper-parameters are tuned using a novel hyper-parameters optimization method that integrates Bayesian Optimization and the selection operator of the Genetic algorithm. We employ our DRL pipeline on reserve price optimization problems in online advertising as a case study. We show that using the DRL configuration obtained by our DRL pipeline, a pricing policy is obtained whose revenue is significantly higher than the benchmark methods. The evaluation is performed by developing a simulation for the RTB environment that makes exploration possible for the RL agent.
- Research Article
18
- 10.23919/jsee.2021.000126
- Dec 1, 2021
- Journal of Systems Engineering and Electronics
This paper presents a deep reinforcement learning (DRL)-based motion control method to provide unmanned aerial vehicles (UAVs) with additional flexibility while flying across dynamic unknown environments autonomously. This method is applicable in both military and civilian fields such as penetration and rescue. The autonomous motion control problem is addressed through motion planning, action interpretation, trajectory tracking, and vehicle movement within the DRL framework. Novel DRL algorithms are presented by combining two difference-amplifying approaches with traditional DRL methods and are used for solving the motion planning problem. An improved Lyapunov guidance vector field (LGVF) method is used to handle the trajectory-tracking problem and provide guidance control commands for the UAV. In contrast to conventional motion-control approaches, the proposed methods directly map the sensorbased detections and measurements into control signals for the inner loop of the UAV, i.e., an end-to-end control. The training experiment results show that the novel DRL algorithms provide more than a 20% performance improvement over the state-of-the-art DRL algorithms. The testing experiment results demonstrate that the controller based on the novel DRL and LGVF, which is only trained once in a static environment, enables the UAV to fly autonomously in various dynamic unknown environments. Thus, the proposed technique provides strong flexibility for the controller.
- Research Article
27
- 10.1007/s11269-020-02600-w
- Jun 27, 2020
- Water Resources Management
This paper develops a deep reinforcement learning (DRL) framework for intelligence operation of cascaded hydropower reservoirs considering inflow forecasts, in which two key problems of large discrete action spaces and uncertainty of inflow forecasts are addressed. In this study, a DRL framework is first developed based on a newly defined knowledge sample form and a deep Q-network (DQN). Then, an aggregation-disaggregation model is used to reduce the multi-dimension spaces of state and action for cascaded reservoirs. Following, three DRL models are developed respectively to evaluate the performance of the newly defined decision value functions and modified decision action selection approach. In this paper, the DRL methodologies are tested on China’s Hun River cascade hydropower reservoirs system. The results show that the aggregation-disaggregation model can effectively reduce the dimensions of state and action, which also makes the model structure simpler and has higher learning efficiency. The Bayesian theory in the decision action selection approach is useful to address the uncertainty of inflow forecasts, which can improve the performance to reduce spillages during the wet season. The proposed DRL models outperform the comparison models (i.e., stochastic dynamic programming) in terms of annual hydropower generation and system reliability. This study suggests that the DRL has the potential to be implemented in practice to derive optimal operation strategies.
- Research Article
38
- 10.1109/tits.2021.3055899
- Jul 1, 2022
- IEEE Transactions on Intelligent Transportation Systems
In this paper, a new framework for path tracking is proposed through learning to drive like human beings. Firstly, the imitation algorithm (behavior cloning) is adopted to initialize the deep reinforcement learning (DRL) algorithm through learning the professional drivers’ experience. Secondly, a continuous, deterministic, model free deep reinforcement learning algorithm is adopted to optimize our DRL model on line through trial and error. By combining behavior cloning and deep reinforcement learning algorithms, the DRL model can learn an effective policy quickly for path tracking using some easy-to-measure vehicle state parameters and environment information as inputs. Actor-Critic structure is adopted in the DRL algorithm. In order to speed up the convergence rate of the DRL model and improve the learning effect, we propose a dual actor networks structure for the two different action outputs (steering wheel angle and vehicle speed), and a chief critic network is built to guide the updating process of dual actor networks at the same time. Based on this dual actor networks structure, we can pick out some more important state information as state inputs for different action outputs. Besides, a kind of reward mechanism is also presented for autonomous driving. Finally, simulation training and experiment test are carried out, and the results confirm that the framework proposed in this paper is more than data efficient than the original algorithm, and the trained DRL model can track the reference path with accuracy and has the generalization ability for different roads.
- Research Article
1
- 10.1088/1742-6596/2405/1/012032
- Dec 1, 2022
- Journal of Physics: Conference Series
In recent years, there are more and more space complex operational tasks such as the maintenance and assembly of on-orbit aircraft. Traditional robot planning and control methods require precise dynamic models, which are difficult to accommodate to on-orbit assembly operations in extreme space environments. Typical space operation tasks, such as plug and pull operation, whose control strategy can be artificially designed. Being artificially designed by combining the output control strategy and deep reinforcement learning algorithm, which can simplify the training difficulty of deep reinforcement learning, making the learning process more efficient and training results better. In this paper, a deep Residual reinforcement learning algorithm combined with a heuristic control strategy is constructed to complete the space mechanical arm assembly operation training in a highly realistic simulation environment. Based on the experimental data, the Residual deep reinforcement-learning algorithm designed in this paper shows the performance of rapid convergence and can complete the on-orbit assembly operation task with a high probability.
- Research Article
- 10.1016/j.aiopen.2024.08.005
- Jan 1, 2024
- AI Open
A study of natural robustness of deep reinforcement learning algorithms towards adversarial perturbations
- Conference Article
38
- 10.1109/icc.2019.8761721
- May 1, 2019
In this paper, a novel deep reinforcement learning (deep-RL) framework is proposed to provide model-free ultra reliable low latency communication (URLLC) in the downlink of an orthogonal frequency division multiple access (OFDMA) system. The proposed deep-RL framework can guarantee high end-to-end reliability and low end-to-end latency, under data rate constraints, for each user in the cellular system without any models of or assumptions on the users' traffic. Using the proposed model-free approach, the users' traffic is predicted by the deep-RL framework and subsequently used in the resource allocation, irrespective of the actual underlying model. The problem is posed as a power minimization problem under reliability, latency, and rate constraints. To solve this problem using deep-RL, first, the rate of each user is determined. Then, these rates are mapped to the resource block and power allocation vectors of the studied OFDMA system. Finally, the end-to-end reliability and latency of each user are used as a feedback to the deep-RL framework. It is shown that at the fixed-point of the deep-RL algorithm, the reliability and latency of the users are guaranteed. Simulation results show how the proposed approach can achieve any feasible point in the rate-reliability-latency region, depending on the network and service requirements. For example, for a 7 Mbps rate guarantee, the results show that the proposed algorithm can provide ultra-reliable low latency communication with a delay of 8 milliseconds and a reliability of 98%.
- Research Article
27
- 10.1109/tmc.2020.3029844
- Oct 9, 2020
- IEEE Transactions on Mobile Computing
Long propagation delay that causes throughput degradation of underwater acoustic networks (UWANs) is a critical issue in the medium access control (MAC) protocol design in UWANs. This paper develops a deep reinforcement learning (DRL) based MAC protocol for UWANs, referred to as delayed-reward deep-reinforcement learning multiple access (DR-DLMA), to maximize the network throughput by judiciously utilizing the available time slots resulted from propagation delays or not used by other nodes. In the DR-DLMA design, we first put forth a new DRL algorithm, termed as <i>delayed-reward deep Q-network (DR-DQN)</i>. Then we formulate the multiple access problem in UWANs as a reinforcement learning (RL) problem by defining state, action, and reward in the parlance of RL, and thereby realizing the DR-DLMA protocol. In traditional DRL algorithms, e.g., the original DQN algorithm, the agent can get access to the “reward” from the environment immediately after taking an action. In contrast, in our design, the “reward” (i.e., the ACK packet) is only available after twice the one-way propagation delay after the agent takes an action (i.e., to transmit a data packet). The essence of DR-DQN is to incorporate the propagation delay into the DRL framework and modify the DRL algorithm accordingly. In addition, in order to reduce the cost of online training deep neural network (DNN), we provide a nimble training mechanism for DR-DQN. The optimal network throughputs in various cases are given as a benchmark. Simulation results show that our DR-DLMA protocol with nimble training mechanism can: (i) find the optimal transmission strategy when coexisting with other protocols in a heterogeneous environment; (ii) outperform state-of-the-art MAC protocols (e.g., slotted FAMA and DOTS) in a homogeneous environment; and (iii) greatly reduce energy consumption and run-time compared with DR-DLMA with traditional DNN training mechanism.
- Research Article
- 10.1088/1361-6528/adf64b
- Aug 14, 2025
- Nanotechnology
Prediction of stable nanocluster structures remains a significant challenge in materials and nanocluster research due to the complex nature of potential energy surfaces (PES). To overcome this complexity, a novel deep reinforcement learning (DRL) framework was employed to efficiently scan the PES and identify the global minimum of the Pt13nanocluster alongside other low-energy configurations. The DRL agent iteratively learns to generate energetically favorable configurations by adjusting atomic positions based on feedback from a reward function designed to promote structural stability and discourage unrealistic geometries, such as overlapping or dissociating atoms. Starting from randomized initial structures, the model successfully identifies the most stable configuration of Pt13with icosahedral (Ih) symmetry, and the framework reveals 25 distinct low-energy isomers. The successful identification of a stable structure verifies the effectiveness of the DRL framework. Additionally, Density Functional Theory calculations confirm the stability of the Pt13nanocluster by finding the cohesive energy. The negative cohesive energy confirms the stability, and thermodynamic stability was also assessed at 300 K. The charge, electron localization function, electron density, d-band center, and total density of states indicate that Pt13nanoclusters exhibit the ideal electronic fingerprint of a highly active nano-catalyst. To further check the DRL framework's adaptability, we performed experiments on Pt10and Pt18. This study highlights the efficacy of DRL in navigating complex energy landscapes, predicting stable nanocluster configurations, and providing a robust methodology for optimizing nanoclusters.
- Ask R Discovery
- Chat PDF
AI summaries and top papers from 250M+ research sources.