Accelerate Literature Icon
Want to do a literature review? Try our new Literature Review workflow

On the need for abstract, deep reinforcement learning models in neuroscience.

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon

On the need for abstract, deep reinforcement learning models in neuroscience.

Similar Papers
  • Dissertation
  • 10.62791/19723
Temporal analysis of healthcare trajectories using deep learning
  • Jan 1, 2023
  • Riazat Ryan

Electronic Health Record (EHR) has gained significant importance in recent years due to its potential aspects in clinical prognosis and treatment. Patients’ clinical information is temporal and very crucial in predicting disease progression and proper care. However multivariate, mutually dependent, missingness nature of the data always bring new challenges and fresh experimental opportunities in the field. Traditional machine learning models primarily isolate and target the problems individually, incoherently. To ensure better and improved healthcare, computational models should be reasonable, modular, and generic over different clinical situations and setbacks. This thesis aims to develop advanced deep learning models for the purpose to predict the progressive disease path and find treatment policy to increase survivability for ICU patients. An improved multi-label deep sequential model synthesizes patients’ chronological critical conditions from dynamic temporal input. Subsequently, Multi-Agent Deep Reinforcement Learning (MARL) models have been developed to form a cohesive reward based environment to recommend dynamic treatment plans by scrutinizing the grid with feedback as well as generic clinical rules. Uncertainty is very common in analyzing patients’ condition. The study also encompasses sparse observations in ICU and assesses it as a learning component in Deep Reinforcement learning models through offline fashion. Conclusively, a complete healthcare encompasses both diagnosis and treatment. Consolidation of sequential deep learning and Reinforcement Learning models can help achieve a better AI assistive system for the physicians which can essentially improve a patient’s clinical procedures and offer better care.

  • Conference Article
  • Cite Count Icon 5
  • 10.2118/217733-ms
Deep Reinforcement Learning for Automatic Drilling Optimization Using an Integrated Reward Function
  • Feb 27, 2024
  • Xu Huang + 3 more

Drilling optimization is a complicated multi-objective processing optimization problem. During drilling, drillers need to adjust WOB and RPM continuously in a timely manner, not only to maximize ROP, but also to prevent severe vibration and maintain downhole tool durability. In this study, a virtual drilling agent using a deep reinforcement learning (RL) model is developed and trained to automatically make drilling decisions and proven to effectively optimize drilling parameters. A deep RL model using a deep deterministic policy gradient (DDPG) algorithm is developed to optimize drilling process. In RL model, the reward of drilling decisions at each drilling step is a function of drilling ROP, downhole vibration tendency, bit dull state, and risks of tool failure. Separate modules to evaluate the reward of each component are implemented and trained using field and laboratory data. Deep RL model is applied and tested comprehensively on different drilling environments including hard and abrasive rock, embedded rock, vibrational vs. stable drilling. The hyper-parameters of the actor-critic NN architecture in RL model are carefully selected to improve the model convergence. Results show the deep RL model can effectively find the optimum drilling solutions in various drilling environments. In soft formation, RL model applies the upper limit of WOB and RPM throughout the drilling depth to maximum ROP and reduce drilling time. In hard and abrasive formation, RL model gradually changes RPM and WOB to prevent the pre-mature wear of PDC cutters. The change of the drilling parameters is optimized based on rock abrasivity and target drilling depth. In unstable drilling environment, while RL model limits the ratio of WOB and RPM to avoid stick-slip vibration, simultaneously, the WOB and RPM is controlled to maximum ROP to drill to TD. In embedded formation, RL model successfully found the optimum solution by adjusting WOB/RPM to avoid stick-slip and overloading of bit cutting structure. The learning process of RL model shows hyper-parameter selection plays a critical role in model convergence and accuracy. Improperly selected hyper-parameter in RL model can lead to the failure of solution searching or sub-optimum solution. Overall, the RL model is approved to effectively find optimum drilling solutions in the various drilling environments and can be applied for both pre-well drilling planning and real-time drilling optimization. To the best of authors' knowledge, this is the first attempt to develop deep RL model for drilling optimization by implementing a combination of ROP, vibration, bit dull, and durability in the reward function. The proposed RL model can be extended to include more reward factors in the drilling optimization such as whirl and high frequency torsional oscillation (HFTO), stuck pipe, tool temperature and so on. The RL model can be applied for both pre-well drilling planning and real-time drilling optimization.

  • Dissertation
  • Cite Count Icon 3
  • 10.11606/t.3.2021.tde-10082021-160557
Automated stock trading system using deep reinforcement learning and price and sentiment prediction modules.
  • Jun 15, 2021
  • Roberto Fray Da Silva

The artificial intelligence models are considered state of the art in several domains.The deep reinforcement learning models, one of the main categories of artificial intelligence\\'s models, have a high potential for being applied on domains with high complexity, nonlinearities, and the existence of autocorrelation, seasonal and cyclical components,and noise. One highly relevant domain that presents these characteristics is stock markettrading. Recent works were conducted in this domain using deep reinforcement learning. Nevertheless, these did not consider integrating other relevant components such as price time series prediction and market sentiment analysis. Another critical gap is the lack of comparison of different deep reinforcement learning models in different stock trading scenarios. Besides being an important developing market, the Brazilian stock market is one of the 20 biggest markets in the world. A critical problem for all the investors in this stock market is how to improve the strategies and systems used for improving returns, considering their associated risks. This research aims to investigate and propose a system for automatic asset trading considering multiple features, time series prediction, sentiment analysis, and deep reinforcement learning models. The methodology used was a simulation of the market environment simulation, considering one asset and the evaluation of two relevant scenarios. Eight versions of the proposed system were implemented and evaluated, considering six relevant domain metrics and the buy-and-hold strategy, the main baseline model in the literature. For the first scenario, which simulated a cycle with upward and downward trends, the system\\'s configuration that presented the best results used the price prediction component obtained from a recurrent neural network with a maximum order size of 200 stocks. It obtained better results than the baseline model. For the second scenario, which simulated a deep downward trend, all the system configurations presented better results than the baseline model. The configuration using a recurrent neural network for price prediction and a maximum order size of 10 stocks presented the best results. The main contribution of this research for the deep reinforcement learning area was the proposal of a system that uses additional time series analysis and sentiment analysis features extracted with deep learning models. The main contribution of this research for stock market trading was to propose the use of deep reinforcement learning considering as features: market prices, volume traded, technical indicators, and price and market sentiment predictions obtained using deep learning models. The proposed system can be used in different markets and assets and adapted to other sub-domains.

  • Conference Article
  • Cite Count Icon 7
  • 10.65109/sebo3603
Temporal Watermarks for Deep Reinforcement Learning Models
  • May 3, 2021
  • Kangjie Chen + 4 more

Watermarking has become a popular and attractive technique to protect the Intellectual Property (IP) of Deep Learning (DL) models. However, very few studies explore the possibility of watermarking Deep Reinforcement Learning (DRL) models. Common approaches in the DL context embed backdoors into the protected model and use special samples to verify the model ownership. These solutions are easy to be detected, and can potentially affect the performance and behaviors of the target model. Such limitations make existing solutions less applicable to safety- and security-critical tasks and scenarios, where DRL has been widely used. In this work, we propose a novel watermarking scheme for DRL protection. Instead of using spatial watermarks as in DL models, we introduce temporal watermarks, which can reduce the potential impact and damage to the target model, while achieving ownership verification with high fidelity. Specifically, (1) we design a new damage metric to select sequential states for watermark generation; (2) we introduce a new reward function to efficiently alter the model's behaviors for watermark embedding; (3) we propose to utilize a predefined probability density function of actions over the watermark states as the verification evidence. The integration of these techniques enables a DRL model owner to embed the watermarks for ownership verification and IP protection. Our method is general and can be applied to various DRL tasks with either deterministic or stochastic reinforcement learning algorithms. Extensive experimental results show that it can effectively preserve the functionality of DRL models and exhibit significant robustness against common model modifications, e.g., fine-tuning and model compression.

  • Conference Article
  • Cite Count Icon 31
  • 10.5220/0007722000520058
An Empirical Research on the Investment Strategy of Stock Market based on Deep Reinforcement Learning model
  • Jan 1, 2019
  • Yuming Li + 2 more

The stock market plays a major role in the entire financial market. How to obtain effective trading signals in the stock market is a topic that stock market has long been discussing. This paper first reviews the Deep Reinforcement Learning theory and model, validates the validity of the model through empirical data, and compares the benefits of the three classical Deep Reinforcement Learning models. From the perspective of the automated stock market investment transaction decision-making mechanism, Deep Reinforcement Learning model has made a useful reference for the construction of investor automation investment model, the construction of stock market investment strategy, the application of artificial intelligence in the field of financial investment and the improvement of investor strategy yield.

  • Dissertation
  • 10.32657/10356/182221
Backdoor in deep learning: new threats and opportunities
  • Jan 1, 2025
  • Kangjie Chen

Deep learning has become increasingly popular due to its remarkable ability to learn high-dimensional feature representations. Numerous algorithms and models have been developed to enhance the application of deep learning across various real-world tasks, including image classification, natural language processing, and autonomous driving. However, deep learning models are susceptible to backdoor threats, where an attacker manipulates the training process or data to cause incorrect predictions on malicious samples containing specific triggers, while maintaining normal performance on benign samples. With the advancement of deep learning, including evolving training schemes and the need for large-scale training data, new threats in the backdoor domain continue to emerge. Conversely, backdoors can also be leveraged to protect deep learning models, such as through watermarking techniques. In this thesis, we conduct an in-depth investigation into backdoor techniques from three novel perspectives. In the first part of this thesis, we demonstrate that emerging deep learning training schemes can introduce new backdoor risks. Specifically, pre-trained Natural Language Processing (NLP) models can be easily adapted to a variety of downstream language tasks, significantly accelerating the development of language models. However, the pre-trained model becomes a single point of failure for these downstream models. We propose a novel task-agnostic backdoor attack against pre-trained NLP models, wherein the adversary does not need prior information about the downstream tasks when implanting the backdoor into the pre-trained model. Any downstream models transferred from this malicious model will inherit the backdoor, even after extensive transfer learning, revealing the severe vulnerability of pre-trained foundation models to backdoor attacks. In the second part of this thesis, we develop novel backdoor attack methods suited to new threat scenarios. The rapid expansion of deep learning models necessitates large-scale training data, much of which is unlabeled and outsourced to third parties for annotation. To ensure data security, most datasets are read-only for training samples, preventing the addition of input triggers. Consequently, attackers can only achieve data poisoning by uploading malicious annotations. In this practical scenario, all existing data poisoning methods that add triggers to the input are infeasible. Therefore, we propose new backdoor attack methods that involve poisoning only the labels without modifying any input samples. In the third part of this thesis, we utilize the backdoor technique to proactively protect our deep learning models, specifically for intellectual property protection. Considering the complexity of deep learning tasks, generating a well-trained deep learning model requires substantial computational resources, training data, and expertise. Therefore, it is essential to protect these assets and prevent copyright infringement. Inspired by backdoor attacks that can induce specific behaviors in target models through carefully designed samples, several watermarking methods have been proposed to protect the intellectual property of deep learning models. Model owners can train their models to produce unique outputs for certain crafted samples and use these samples for ownership verification. While various extraction techniques have been designed for supervised deep learning models, challenges arise when applying them to deep reinforcement learning models due to differences in model features and scenarios. Therefore, we propose a novel watermarking scheme to protect deep reinforcement learning models from unauthorized distribution. Instead of using spatial watermarks as in conventional deep learning models, we design temporal watermarks that minimize potential impact and damage to the protected deep reinforcement learning model while achieving high-fidelity ownership verification. In summary, this thesis investigates the evolving landscape of backdoor threats during the development of deep learning techniques and the use of backdoors for beneficial purposes in intellectual property protection.

  • Research Article
  • Cite Count Icon 35
  • 10.1016/j.oceaneng.2023.116527
Deep reinforcement learning based collision avoidance system for autonomous ships
  • Dec 12, 2023
  • Ocean Engineering
  • Yong Wang + 6 more

Deep reinforcement learning based collision avoidance system for autonomous ships

  • Conference Article
  • Cite Count Icon 6
  • 10.1109/aero50100.2021.9438291
Applicability and Challenges of Deep Reinforcement Learning for Satellite Frequency Plan Design
  • Mar 6, 2021
  • Juan Jose Garau Luis + 2 more

The study and benchmarking of Deep Reinforcement Learning (DRL) models has become a trend in many industries, including aerospace engineering and communications. Recent studies in these fields propose these kinds of models to address certain complex real-time decision-making problems in which classic approaches do not meet time requirements or fail to obtain optimal solutions. While the good performance of DRL models has been proved for specific use cases or scenarios, most studies do not discuss the compromises and generalizability of such models during real operations. In this paper we explore the tradeoffs of different elements of DRL models and how they might impact the final performance. To that end, we choose the Frequency Plan Design (FPD) problem in the context of multibeam satellite constellations as our use case and propose a DRL model to address it. We identify six different core elements that have a major effect in its performance: the policy, the policy optimizer, the state, action, and reward representations, and the training environment. We analyze different alternatives for each of these elements and characterize their effect. We also use multiple environments to account for different scenarios in which we vary the dimensionality or make the environment non-stationary. Our findings show that DRL is a potential method to address the FPD problem in real operations, especially because of its speed in decision-making. However, no single DRL model is able to outperform the rest in all scenarios, and the best approach for each of the six core elements depends on the features of the operation environment. While we agree on the potential of DRL to solve future complex problems in the aerospace industry, we also reflect on the importance of designing appropriate models and training procedures, understanding the applicability of such models, and reporting the main performance tradeoffs.

  • Research Article
  • Cite Count Icon 18
  • 10.1016/j.energy.2022.124140
An integrated framework based on deep learning algorithm for optimizing thermochemical production in heavy oil reservoirs
  • Apr 29, 2022
  • Energy
  • Yuhao Zhou + 1 more

An integrated framework based on deep learning algorithm for optimizing thermochemical production in heavy oil reservoirs

  • Book Chapter
  • 10.1007/978-981-19-7554-7_7
Watermarks for Deep Reinforcement Learning
  • Nov 28, 2022
  • Kangjie Chen

In this chapter, we introduce a new watermarking scheme for deep reinforcement learning protection. To protect the intellectual property of deep learning models, various watermarking approaches have been proposed. However, considering the complexity and stochasticity of reinforcement learning tasks, we cannot apply existing watermarking techniques for deep learning models to the deep reinforcement learning scenario directly. Existing watermarking approaches for deep learning models adopt backdoor methods to embed special sample–label pairs into protected models and query suspicious models with these designed samples to claim and identify ownership. Challenges arise when applying existing solutions to deep reinforcement learning models. Different from conventional deep learning models, which give single output for each discrete input at one time instant, the current predicted outputs of reinforcement learning can affect subsequent states. Therefore, if we apply discrete watermark methods to deep reinforcement learning models, the temporal decision characteristics and the high randomness in deep reinforcement learning strategies may decrease the verification accuracy. Besides, existing discrete watermarking approaches may affect the performance of the target deep reinforcement learning model. In this chapter, motivated by the above limitation, we introduce a novel watermark concept, temporal watermarks, which can preserve the performance of the protected models, while achieving high fidelity ownership verification. The proposed temporal watermarking method can be applied to both deterministic and stochastic reinforcement learning algorithms.

  • Research Article
  • 10.2139/ssrn.3894285
An Integrated Framework Based on Deep Reinforcement Learning Algorithm for Optimizing Thermal and Chemical Production in Edge-Water Heavy Oil Reservoirs
  • Jan 1, 2021
  • SSRN Electronic Journal
  • Yuhao Zhou + 2 more

An Integrated Framework Based on Deep Reinforcement Learning Algorithm for Optimizing Thermal and Chemical Production in Edge-Water Heavy Oil Reservoirs

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 9
  • 10.3390/app121910145
Deep Reinforcement Learning for Vehicle Platooning at a Signalized Intersection in Mixed Traffic with Partial Detection
  • Oct 9, 2022
  • Applied Sciences
  • Hung Tuan Trinh + 2 more

The intersection management system can increase traffic capacity, vehicle safety, and the smoothness of all vehicle movement. Platoons of connected vehicles (CVs) use communication technologies to share information with each other and with infrastructures. In this paper, we proposed a deep reinforcement learning (DRL) model that applies to vehicle platooning at an isolated signalized intersection with partial detection. Moreover, we identified hyperparameters and tested the system with different numbers of vehicles (1, 2, and 3) in the platoon. To compare the effectiveness of the proposed model, we implemented two benchmark options, actuated traffic signal control (ATSC) and max pressure (MP). The experimental results demonstrated that the DRL model has many outstanding advantages compared to other models. Through the learning process, the average waiting time of vehicles in the DRL method was improved by 20% and 28% compared with the ATSC and MP options. The results also suggested that the DRL model is effective when the CV penetration rate is over 20%.

  • Research Article
  • Cite Count Icon 157
  • 10.1109/tits.2020.3003163
A Hybrid of Deep Reinforcement Learning and Local Search for the Vehicle Routing Problems
  • Jul 15, 2020
  • IEEE Transactions on Intelligent Transportation Systems
  • Jiuxia Zhao + 3 more

Different variants of the Vehicle Routing Problem (VRP) have been studied for decades. State-of-the-art methods based on local search have been developed for VRPs, while still facing problems of slow running time and poor solution quality in the case of large problem size. To overcome these problems, we first propose a novel deep reinforcement learning (DRL) model, which is composed of an actor, an adaptive critic and a routing simulator. The actor, based on the attention mechanism, is designed to generate routing strategies. The adaptive critic is devised to change the network structure adaptively, in order to accelerate the convergence rate and improve the solution quality during training. The routing simulator is developed to provide graph information and reward with the actor and adaptive cirtic. Then, we combine this DRL model with a local search method to further improve the solution quality. The output of the DRL model can serve as the initial solution for the following local search method, from where the final solution of the VRP is obtained. Tested on three datasets with customer points of 20, 50 and 100 respectively, experimental results demonstrate that the DRL model alone finds better solutions compared to construction algorithms and previous DRL approaches, while enabling a 5- to 40-fold speedup. We also observe that combining the DRL model with various local search methods yields excellent solutions at a superior generation speed, comparing to that of other initial solutions.

  • Book Chapter
  • 10.2174/9789815322316125010010
Advancing Aerial Monitoring with Deep Reinforcement Learning Models for Aircraft Detection in Satellite Imagery
  • Nov 26, 2025
  • Anirudh Singh + 2 more

Aircraft detection from satellite imagery is a pivotal task with multifaceted applications across surveillance, environmental monitoring, and defense. In this chapter, we present a comprehensive investigation into enhancing aircraft detection accuracy through the utilization of Deep Reinforcement Learning (DRL) models. Our research explores four prominent DRL models: Deep Q-Networks (DQN), Double DQN, Rainbow, and Proximal Policy Optimization (PPO), evaluating their performance rigorously on diverse datasets. We delve into the nuances of each model's architecture and training methodologies, aiming to identify the most effective approach for aircraft detection tasks. Through extensive experimentation and evaluation, we meticulously analyze the strengths and weaknesses of each DRL model in the context of aircraft detection. Our findings reveal compelling insights into the comparative performance of the models, shedding light on their respective capabilities. Notably, our experiments demonstrate that while all models exhibit promising capabilities, Proximal Policy Optimization emerges as the top performer, achieving an impressive accuracy rate of 98.65%. This remarkable achievement underscores the efficacy of PPO in significantly improving the accuracy and reliability of aircraft detection in satellite imagery. Furthermore, we delve into the interpretability of the models' decision-making processes, elucidating the factors influencing their performance and providing valuable insights into their inner workings. By unravelling the mechanisms behind the models' decision-making process, we aim to enhance the transparency and trustworthiness of aircraft detection systems deployed in real-world scenarios. Our research contributes significantly to the advancement of aircraft detection technology in satellite imagery, offering practical implications for improving surveillance and monitoring systems. By leveraging the power of deep reinforcement learning models, particularly Proximal Policy Optimization, we have paved the way for more robust and efficient aircraft detection solutions that can address the evolving challenges in remote sensing and aerial surveillance.

  • Research Article
  • 10.3390/en18071809
Optimal Power Flow for High Spatial and Temporal Resolution Power Systems with High Renewable Energy Penetration Using Multi-Agent Deep Reinforcement Learning
  • Apr 3, 2025
  • Energies
  • Liangcai Zhou + 5 more

The increasing integration of renewable energy sources (RESs) introduces significant uncertainties in both generation and demand, presenting critical challenges to the convergence, feasibility, and real-time performance of optimal power flow (OPF). To address these challenges, a multi-agent deep reinforcement learning (DRL) model is proposed to solve the OPF while ensuring constraints are satisfied rapidly. A heterogeneous multi-agent proximal policy optimization (H-MAPPO) DRL algorithm is introduced for multi-area power systems. Each agent is responsible for regulating the output of generation units in a specific area, and together, the agents work to achieve the global OPF objective, which reduces the complexity of the DRL model’s training process. Additionally, a graph neural network (GNN) is integrated into the DRL framework to capture spatiotemporal features such as RES fluctuations and power grid topological structures, enhancing input representation and improving the learning efficiency of the DRL model. The proposed DRL model is validated using the RTS-GMLC test system, and its performance is compared to MATPOWER with the interior-point iterative solver. The RTS-GMLC test system is a power system with high spatial–temporal resolution and near-real load profiles and generation curves. Test results demonstrate that the proposed DRL model achieves a 100% convergence and feasibility rate, with an optimal generation cost similar to that provided by MATPOWER. Furthermore, the proposed DRL model significantly accelerates computation, achieving up to 85 times faster processing than MATPOWER.

Save Icon
Up Arrow
Open/Close
Notes

Save Important notes in documents

Highlight text to save as a note, or write notes directly

You can also access these Documents in Paperpal, our AI writing tool

Powered by our AI Writing Assistant