Development of a multi-agent adaptive recommendation system based on reinforcement learning

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon
Take notes icon Take Notes

This study's object is the process that improves efficiency and accuracy in delivering personalized recommendations to users in systems based on reinforcement learning. The principal task addressed in the study is to improve recommendation adaptation and personalization by assigning a dedicated agent to each user. This approach reduces the influence of other users’ activity and allows for more precise modeling of individual preferences. The proposed approach employs an Actor–Critic model implemented using the Deep Deterministic Policy Gradient algorithm to achieve more stable training and maximize long-term rewards in sequential decision-making processes. Recommendations are generated using the unique characteristics of items that are based on users’ historical interactions. Neural networks are trained with separate parameter configurations for single-agent and multi-agent models. Experimental results on the MovieLens dataset demonstrate the superiority of the multi-agent model over the single-agent baseline across key evaluation metrics. For top-5 recommendations, the multi-agent model achieved improvements of + 4% for Precision@5; + 0.32% for Recall@5; and + 2.92% in Normalized Discounted Cumulative Gain NDCG@5. For top-10 recommendations, gains were + 1% for Precision@10; + 0.18% for Recall@10; and + 1.14% for NDCG@10, respectively. Simulations for individual users showed that the multi-agent model outperformed the single-agent baseline in 66 out of 100 cases in terms of cumulative reward. The proposed system demonstrates effectiveness in capturing user preferences, improving recommendation quality, and adapting to evolving user preferences over time. The main area of practical application for the results includes dynamic online environments such as e-commerce systems, media platforms, social networks, and news aggregators.

Similar Papers
  • Research Article
  • Cite Count Icon 1
  • 10.1016/j.procs.2024.09.100
Dynamic Resource Allocation of Reinforcement Learning Based on Neural Networks in Software Defined Networks
  • Jan 1, 2024
  • Procedia Computer Science
  • Xinjiu Xie

Dynamic Resource Allocation of Reinforcement Learning Based on Neural Networks in Software Defined Networks

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 5
  • 10.3390/machines10070496
A State-Compensated Deep Deterministic Policy Gradient Algorithm for UAV Trajectory Tracking
  • Jun 21, 2022
  • Machines
  • Jiying Wu + 5 more

The unmanned aerial vehicle (UAV) trajectory tracking control algorithm based on deep reinforcement learning is generally inefficient for training in an unknown environment, and the convergence is unstable. Aiming at this situation, a Markov decision process (MDP) model for UAV trajectory tracking is established, and a state-compensated deep deterministic policy gradient (CDDPG) algorithm is proposed. An additional neural network (C-Net) whose input is compensation state and output is compensation action is added to the network model of a deep deterministic policy gradient (DDPG) algorithm to assist in network exploration training. It combined the action output of the DDPG network with compensated output of the C-Net as the output action to interact with the environment, enabling the UAV to rapidly track dynamic targets in the most accurate continuous and smooth way possible. In addition, random noise is added on the basis of the generated behavior to realize a certain range of exploration and make the action value estimation more accurate. The OpenAI Gym tool is used to verify the proposed method, and the simulation results show that: (1) The proposed method can significantly improve the training efficiency by adding a compensation network and effectively improve the accuracy and convergence stability; (2) Under the same computer configuration, the computational cost of the proposed algorithm is basically the same as that of the QAC algorithm (Actor-critic algorithm based on behavioral value Q) and the DDPG algorithm; (3) During the training process, with the same tracking accuracy, the learning efficiency is about 70% higher than that of QAC and DDPG; (4) During the simulation tracking experiment, under the same training time, the tracking error of the proposed method after stabilization is about 50% lower than that of QAC and DDPG.

  • Conference Article
  • Cite Count Icon 4
  • 10.1109/ccta.2019.8920682
Deep Deterministic Policy Gradient-based Parameter Selection Method of Notch Filters for Suppressing Mechanical Resonance in Industrial Servo Systems
  • Aug 1, 2019
  • Tae-Ho Oh + 8 more

This paper presents a parameter selection method of notch filters for suppressing mechanical resonances in industrial servo systems using the deep deterministic policy gradient (DDPG) algorithm. Several methods for tuning the notch filter parameters were studied such as the fast-Fourier-transform-based methods, extended-Kalman-filter-based methods and adaptive notch filter methods. However, these methods do not find the Q parameters of notch filters which play an important role in determining the system stability, and do not consider the cases in which multiple notch filters are required. Deep-Q-network-based method was developed to solve these problems, but the notch filter parameter tuning is limited to discrete action spaces. This paper develops a new parameter selection method of notch filters, using the DDPG algorithm. DDPG algorithm, which is a model-free and actor-critic algorithm using deep neural networks, is utilized for its capability to operate over continuous action spaces. Experiments are performed using an actual industrial servo system to demonstrate that the developed parameter selection method successfully finds the notch filter parameters to suppress the resonances of the system.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 19
  • 10.1155/2022/3139610
A UAV Pursuit-Evasion Strategy Based on DDPG and Imitation Learning
  • May 9, 2022
  • International Journal of Aerospace Engineering
  • Xiaowei Fu + 4 more

The UAV pursuit-evasion strategy based on Deep Deterministic Policy Gradient (DDPG) algorithm is a current research hotspot. However, this algorithm has the defect of low efficiency in sample exploration. To solve this problem, this paper uses the imitation learning (IL) to improve the DDPG exploration strategy. A kind of quasiproportional guidance control law is designed to generate effective learning samples, which are used as the data of the initial experience pool of DDPG algorithm. The UAV pursuit-evasion strategy based on DDPG and imitation learning (IL-DDPG) is proposed, and the algorithm obtains the data from the experience pool for experience playback learning, which improves the exploration efficiency of the algorithm in the initial stage of training and avoids the problem of too many useless exploration in the training process. The simulation results show that the trained pursuit-UAV can flexibly adjust the flight speed and flight attitude to pursuit the evasion-UAV quickly. It also verifies that the improved DDPG algorithm is more effective than the basic DDPG algorithm to improve the training efficiency.

  • Research Article
  • 10.1088/1742-6596/2472/1/012010
Transition state performance optimization of propfan engine based on DDPG algorithm
  • May 1, 2023
  • Journal of Physics: Conference Series
  • Hua Zheng + 3 more

As the “heart” of an aircraft, aero-engines work in harsh environments of high temperature and high pressure for a long time. In order to ensure that the engine can operate safely and reliably within the entire flight envelope, a large safety margin needs to be reserved in the design of the control system. This design idea limits the full play of the engine performance, so it is necessary to carry out the research on the performance search control (PSC) of aero-engine. This paper studies the performance optimization control of propfan engine based on deep reinforcement learning algorithm. The Deep Deterministic Policy Gradient (DDPG) algorithm, which is more suitable for continuous action space, is used to optimize the acceleration process of the propfan engine. The simulation results show that, compared with the unoptimized adjustment process, the transition state adjustment time of the engine is reduced by 32.5% under the action of the optimal adjustment law obtained by the DDPG algorithm. Therefore, the DDPG algorithm can be applied to the performance optimization of the engine acceleration process, and has a good transition state performance optimization effect.

  • Research Article
  • 10.62051/ijcsit.v2n1.22
Energy Management for Hybrid Energy Storage System in Electric Based on Deep Deterministic Policy Gradient
  • Mar 22, 2024
  • International Journal of Computer Science and Information Technology
  • Shuai Xia + 1 more

In this paper, an intelligent control system design scheme based on deep deterministic policy gradient (DDPG) algorithm is proposed for the complex continuous action space problem in the hybrid energy storage system of electric vehicles. Firstly, the basic principle and internal logic of DDPG algorithm are introduced, including key elements such as Actor-Critic architecture, experience playback, target network, reward signal, policy gradient and value function update. Then, how to apply the DDPG algorithm to the industrial control system is described in detail. The Actor network learns the optimal strategy, the Critic network evaluates the value of the state-action pair, and uses the experience playback and the target network to improve the system stability and performance. Finally, the effect of the intelligent control system based on DDPG algorithm in complex environment is verified by simulation experiments. The results show that the system can effectively optimize the control strategy, improve the response speed and stability of the system, and has a good engineering application prospect.

  • Research Article
  • Cite Count Icon 4
  • 10.3390/math11010132
Stability Analysis for Autonomous Vehicle Navigation Trained over Deep Deterministic Policy Gradient
  • Dec 27, 2022
  • Mathematics
  • Mireya Cabezas-Olivenza + 4 more

The Deep Deterministic Policy Gradient (DDPG) algorithm is a reinforcement learning algorithm that combines Q-learning with a policy. Nevertheless, this algorithm generates failures that are not well understood. Rather than looking for those errors, this study presents a way to evaluate the suitability of the results obtained. Using the purpose of autonomous vehicle navigation, the DDPG algorithm is applied, obtaining an agent capable of generating trajectories. This agent is evaluated in terms of stability through the Lyapunov function, verifying if the proposed navigation objectives are achieved. The reward function of the DDPG is used because it is unknown if the neural networks of the actor and the critic are correctly trained. Two agents are obtained, and a comparison is performed between them in terms of stability, demonstrating that the Lyapunov function can be used as an evaluation method for agents obtained by the DDPG algorithm. Verifying the stability at a fixed future horizon, it is possible to determine whether the obtained agent is valid and can be used as a vehicle controller, so a task-satisfaction assessment can be performed. Furthermore, the proposed analysis is an indication of which parts of the navigation area are insufficient in training terms.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 5
  • 10.3390/sym13061061
Deep Deterministic Policy Gradient Algorithm Based on Convolutional Block Attention for Autonomous Driving
  • Jun 12, 2021
  • Symmetry
  • Yanliang Jin + 3 more

The research on autonomous driving based on deep reinforcement learning algorithms is a research hotspot. Traditional autonomous driving requires human involvement, and the autonomous driving algorithms based on supervised learning must be trained in advance using human experience. To deal with autonomous driving problems, this paper proposes an improved end-to-end deep deterministic policy gradient (DDPG) algorithm based on the convolutional block attention mechanism, and it is called multi-input attention prioritized deep deterministic policy gradient algorithm (MAPDDPG). Both the actor network and the critic network of the model have the same structure with symmetry. Meanwhile, the attention mechanism is introduced to help the vehicles focus on useful environmental information. The experiments are conducted in the open racing car simulator (TORCS)and the results of five experiment runs on the test tracks are averaged to obtain the final result. Compared with the state-of-the-art algorithm, the maximum reward increases from 62,207 to 116,347, and the average speed increases from 135 km/h to 193 km/h, while the number of success episodes to complete a circle increases from 96 to 147. Also, the variance of the distance from the vehicle to the center of the road is compared, and the result indicates that the variance of the DDPG is 0.6 m while that of the MAPDDPG is only 0.2 m. The above results indicate that the proposed MAPDDPG achieves excellent performance.

  • Book Chapter
  • Cite Count Icon 1
  • 10.1007/978-3-030-93049-3_11
An Improved DDPG Algorithm with Barrier Function for Lane-Change Decision-Making of Intelligent Vehicles
  • Jan 1, 2021
  • Tianshuo Feng + 3 more

As a decision-making problem with interaction between vehicles, it is difficult to describe intelligent vehicle lane change state space using a rule-based decision system. The deep deterministic policy gradient (DDPG) algorithm offers good performance for autonomous driving decision, but still has slow convergence and high collision probability in learning process when applied to lane change. Therefore, we propose an improved deep deterministic policy gradient algorithm with barrier function (DDPG-BF) algorithm to address these problems. The barrier function is constructed depending on the safety distance required for lane changes, and DDPG algorithm optimization is improved by guiding the vehicle to choose actions within safety constraints. Simulation results on TORCS confirmed that the proposed method converged in hundreds of training episodes, and reduced the unsafe behavior ratio to less than 0.05. Compared with DDPG and FEC-DDPG algorithm, the proposed method has the contribution to improve the convergence speed of learning and maintain the safe distance between vehicles in lane change.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 15
  • 10.37965/jait.2021.12003
UAV maneuvering decision -making algorithm based on Twin Delayed Deep Deterministic Policy Gradient Algorithm
  • Dec 7, 2021
  • Journal of Artificial Intelligence and Technology
  • Shuangxia Bai + 5 more

Aiming at intelligent decision-making of UAV based on situation information in air combat, a novel maneuvering decision method based on deep reinforcement learning is proposed in this paper. The autonomous maneuvering model of UAV is established by Markov Decision Process. The Twin Delayed Deep Deterministic Policy Gradient(TD3) algorithm and the Deep Deterministic Policy Gradient (DDPG) algorithm in deep reinforcement learning are used to train the model, and the experimental results of the two algorithms are analyzed and compared. The simulation experiment results show that compared with the DDPG algorithm, the TD3 algorithm has stronger decision-making performance and faster convergence speed, and is more suitable forsolving combat problems. The algorithm proposed in this paper enables UAVs to autonomously make maneuvering decisions based on situation information such as position, speed, and relative azimuth, adjust their actions to approach and successfully strike the enemy, providing a new method for UAVs to make intelligent maneuvering decisions during air combat.

  • Research Article
  • 10.2478/amns-2024-3426
Multi-Microgrid Energy Trading Strategy Based on Multi-Agent Deep Deterministic Policy Gradient Algorithm
  • Jan 1, 2024
  • Applied Mathematics and Nonlinear Sciences
  • Genhong Qi + 5 more

Compared to individual microgrid, multi-microgrid (MMG) system can enhance the overall utilization of renewable energy, effectively improve the operational stability of local microgrids, and reduce the dependence on main grid. However, energy management of MMG encounters significant challenges due to the complex interaction between different microgrids. To tackle this issue, this paper introduces a non-cooperative gamebased optimal scheduling market trading model for MMG composed of various renewable energy sources, completing trade decisions while ensuring information independence. Considering the real-time changes in environmental transition functions and complex scheduling scenarios, the multi-agent deep deterministic policy gradient (MADDPG) algorithm is employed, which modifies the experience replay mechanism and Markov process of the basic deep deterministic policy gradient (DDPG) algorithm. Compared to traditional multi-microgrid system scheduling algorithms, the method presented in this paper does not require individual predictions of state variables, achieves end-to-end training from agent states to actions, and ensures the information security and autonomous decision-making of each microgrid.

  • Research Article
  • Cite Count Icon 113
  • 10.1016/j.ast.2019.05.058
Morphing control of a new bionic morphing UAV with deep reinforcement learning
  • May 28, 2019
  • Aerospace Science and Technology
  • Dan Xu + 3 more

Morphing control of a new bionic morphing UAV with deep reinforcement learning

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 1
  • 10.3390/pr11072155
Adaptive Optimization Design of Building Energy System for Smart Elderly Care Community Based on Deep Deterministic Policy Gradient
  • Jul 19, 2023
  • Processes
  • Chunmei Liu + 1 more

In smart elderly care communities, optimizing the design of building energy systems is crucial for improving the quality of life and health of the elderly. This study pioneers an innovative adaptive optimization design methodology for building energy systems by harnessing the cutting-edge capabilities of deep reinforcement learning. This avant-garde method initially involves modeling a myriad of energy equipment embedded within the energy ecosystem of smart elderly care community buildings, thereby extracting their energy computation formulae. In a groundbreaking progression, this study ingeniously employs the actor–critic (AC) algorithm to refine the deep deterministic policy gradient (DDPG) algorithm. The enhanced DDPG algorithm is then adeptly wielded to perform adaptive optimization of the operational states within the energy system of a smart retirement community building, signifying a trailblazing approach in this realm. Simulation experiments indicate that the proposed method has better stability and convergence compared to traditional deep Q-learning algorithms. When the environmental interaction coefficient and learning ratio is 4, the improved DDPG algorithm under the AC framework can converge after 60 iterations. The stable reward value in the convergence state is −996. When the scheduling cycle of the energy system is between 0:00 and 8:00, the photovoltaic output of the system optimized by the DDPG algorithm is 0. The wind power output fluctuates within 50 kW. This study realizes efficient operation, energy saving, and emission reduction in building energy systems in smart elderly care communities and provides new ideas and methods for research in this field. It also provides an important reference for the design and operation of building energy systems in smart elderly care communities.

  • Conference Article
  • Cite Count Icon 1
  • 10.1109/icma54519.2022.9856006
Research on Decision-Making Method of Unmanned Tractor-Trailer Based on T-DDPG
  • Aug 7, 2022
  • Jian Wang + 4 more

Deep reinforcement learning has an excellent performance in decision-making and is widely used in areas such as autonomous driving. To improve the ability of decision-making of tractor-trailers, this paper presented an unmanned tractor-trailer decision-making model based on Deep Deterministic Policy Gradient (DDPG) algorithm. The DDPG algorithm has the problems of slow convergence in the early stage, poor stability, and easily falling into the local minimum value. Combined with the kinematics characteristics of the tractor-trailer, an improved DDPG algorithm is proposed. By adding improved artificial potential fields to the DDPG algorithm, the local minimums can be avoided. Compression of state set and action set of agent improves training speed. The stability of the algorithm is improved by adding a penalty item for a large angle between tractor and trailer and a penalty term for deviating from the desired trajectory to improve algorithm stability. Experimental results showed that the improved model improves learning efficiency, security, and average velocity of the tractor-trailer while ensuring the effectiveness of decision-making.

  • Research Article
  • 10.19693/j.issn.1673-3185.02057
Collision avoidance path planning of tourist ship based on DDPG algorithm
  • Oct 1, 2021
  • DOAJ (DOAJ: Directory of Open Access Journals)
  • Yi Zhou + 3 more

Objective As the core issue of ship navigation safety, ship collision avoidance always depends on the captain's personal status and judgment, which has certain potential safety hazards. In order to coordinate all ships (cruise ships, cargo ships, etc.) in key waters and predict their paths, it is necessary to establish an anti-collision early warning mechanism. Methods Using the deep deterministic policy gradient (DDPG) algorithm and the ship domain model, the electronic chart is used to simulate the ship's navigation path, and the improved strategy of DDPG algorithm based on the key learning of failure region and the improved parameters of ship domain model according to the characteristics of cruise ships are proposed, Improve the accuracy of route prediction and anti-collision. Results Using the improved DDPG algorithm and the improved ship domain model, compared with the previous algorithm, the accuracy of ship collision avoidance is improved from 84.9% to 89.7%, and the average error between the simulated route and the real route is reduced from 25.2 m to 21.4 m. Conclusion Through the ship collision avoidance path planning based on the improved DDPG algorithm and the improved ship domain model, the ship route supervision function in water area can be realized, and when the predicted route intersects with other ships, the dispatcher will be alerted to realize the anti-collision early warning mechanism.

Save Icon
Up Arrow
Open/Close
  • Ask R Discovery Star icon
  • Chat PDF Star icon

AI summaries and top papers from 250M+ research sources.