Watermarks for Deep Reinforcement Learning

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon
Take notes icon Take Notes

In this chapter, we introduce a new watermarking scheme for deep reinforcement learning protection. To protect the intellectual property of deep learning models, various watermarking approaches have been proposed. However, considering the complexity and stochasticity of reinforcement learning tasks, we cannot apply existing watermarking techniques for deep learning models to the deep reinforcement learning scenario directly. Existing watermarking approaches for deep learning models adopt backdoor methods to embed special sample–label pairs into protected models and query suspicious models with these designed samples to claim and identify ownership. Challenges arise when applying existing solutions to deep reinforcement learning models. Different from conventional deep learning models, which give single output for each discrete input at one time instant, the current predicted outputs of reinforcement learning can affect subsequent states. Therefore, if we apply discrete watermark methods to deep reinforcement learning models, the temporal decision characteristics and the high randomness in deep reinforcement learning strategies may decrease the verification accuracy. Besides, existing discrete watermarking approaches may affect the performance of the target deep reinforcement learning model. In this chapter, motivated by the above limitation, we introduce a novel watermark concept, temporal watermarks, which can preserve the performance of the protected models, while achieving high fidelity ownership verification. The proposed temporal watermarking method can be applied to both deterministic and stochastic reinforcement learning algorithms.

Similar Papers
  • Conference Article
  • Cite Count Icon 1
  • 10.65109/sebo3603
Temporal Watermarks for Deep Reinforcement Learning Models
  • May 3, 2021
  • Kangjie Chen + 4 more

Watermarking has become a popular and attractive technique to protect the Intellectual Property (IP) of Deep Learning (DL) models. However, very few studies explore the possibility of watermarking Deep Reinforcement Learning (DRL) models. Common approaches in the DL context embed backdoors into the protected model and use special samples to verify the model ownership. These solutions are easy to be detected, and can potentially affect the performance and behaviors of the target model. Such limitations make existing solutions less applicable to safety- and security-critical tasks and scenarios, where DRL has been widely used. In this work, we propose a novel watermarking scheme for DRL protection. Instead of using spatial watermarks as in DL models, we introduce temporal watermarks, which can reduce the potential impact and damage to the target model, while achieving ownership verification with high fidelity. Specifically, (1) we design a new damage metric to select sequential states for watermark generation; (2) we introduce a new reward function to efficiently alter the model's behaviors for watermark embedding; (3) we propose to utilize a predefined probability density function of actions over the watermark states as the verification evidence. The integration of these techniques enables a DRL model owner to embed the watermarks for ownership verification and IP protection. Our method is general and can be applied to various DRL tasks with either deterministic or stochastic reinforcement learning algorithms. Extensive experimental results show that it can effectively preserve the functionality of DRL models and exhibit significant robustness against common model modifications, e.g., fine-tuning and model compression.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 85
  • 10.1155/2022/9023719
Intrusion Detection System for Industrial Internet of Things Based on Deep Reinforcement Learning
  • Jan 1, 2022
  • Wireless Communications and Mobile Computing
  • Sumegh Tharewal + 5 more

The Industrial Internet of Things has grown significantly in recent years. While implementing industrial digitalization, automation, and intelligence introduced a slew of cyber risks, the complex and varied industrial Internet of Things environment provided a new attack surface for network attackers. As a result, conventional intrusion detection technology cannot satisfy the network threat discovery requirements in today’s Industrial Internet of Things environment. In this research, the authors have used reinforcement learning rather than supervised and unsupervised learning, because it could very well improve the decision‐making ability of the learning process by integrating abstract thinking of complete understanding, using deep knowledge to perform simple and nonlinear transformations of large‐scale original input data into higher‐level abstract expressions, and using learning algorithm or learning based on feedback signals, in the lack of guiding knowledge, which is based on the trial‐and‐error learning model, from the interaction with the environment to find the best good solution. In this respect, this article presents a near‐end strategy optimization method for the Industrial Internet of Things intrusion detection system based on a deep reinforcement learning algorithm. This method combines deep learning’s observation capability with reinforcement learning’s decision‐making capability to enable efficient detection of different kinds of cyberassaults on the Industrial Internet of Things. In this manuscript, the DRL‐IDS intrusion detection system is built on a feature selection method based on LightGBM, which efficiently selects the most attractive feature set from industrial Internet of Things data; when paired with deep learning algorithms, it effectively detects intrusions. To begin, the application is based on GBM’s feature selection algorithm, which extracts the most compelling feature set from Industrial Internet of Things data; then, in conjunction with the deep learning algorithm, the hidden layer of the multilayer perception network is used as the shared network structure for the value network and strategic network in the PPO2 algorithm; and finally, the intrusion detection model is constructed using the PPO2 algorithm and ReLU (R). Numerous tests conducted on a publicly available data set of the Industrial Internet of Things demonstrate that the suggested intrusion detection system detects 99 percent of different kinds of network assaults on the Industrial Internet of Things. Additionally, the accuracy rate is 0.9%. The accuracy, precision, recall rate, F1 score, and other performance indicators are superior to those of the existing intrusion detection system, which is based on deep learning models such as LSTM, CNN, and RNN, as well as deep reinforcement learning models such as DDQN and DQN.

  • Book Chapter
  • Cite Count Icon 9
  • 10.1007/978-3-030-31978-6_12
Visual Rationalizations in Deep Reinforcement Learning for Atari Games
  • Jan 1, 2019
  • Laurens Weitkamp + 2 more

Due to the capability of deep learning to perform well in high dimensional problems, deep reinforcement learning agents perform well in challenging tasks such as Atari 2600 games. However, clearly explaining why a certain action is taken by the agent can be as important as the decision itself. Deep reinforcement learning models, as other deep learning models, tend to be opaque in their decision-making process. In this work, we propose to make deep reinforcement learning more transparent by visualizing the evidence on which the agent bases its decision. In this work, we emphasize the importance of producing a justification for an observed action, which could be applied to a black-box decision agent.

  • Book Chapter
  • 10.4018/979-8-3693-4326-5.ch021
Improving Autonomous Vehicle Technology Through Reinforcement Learning and Deep Learning Models
  • Sep 6, 2024
  • Shabanam Kumari + 3 more

These days, the advancement of Artificial Intelligence (AI) is rapidly expanding across various technological domains, with Autonomous Vehicles (AV) research being just one example. In this book chapter, we will discuss Improving Autonomous Vehicle Technology Through Reinforcement Learning and deep learning Models and algorithms. Autonomous vehicles (AVs) are a game-changing technology that has the potential to revolutionize mobility, increase safety, and improve transportation. The widespread use of AVs, however, depends on resolving important issues, especially about their decision-making procedures. This chapter explores how Autonomous Vehicles technology can be greatly advanced by using deep learning (DL) and reinforcement learning (RL) models to improve decision-making. This chapter aims to accomplish two main goals. Firstly, in the context of autonomous vehicles, to clarify the theoretical underpinnings and real-world applications of deep learning and reinforcement learning.

  • Research Article
  • Cite Count Icon 7
  • 10.1016/j.engappai.2024.108925
Harnessing deep reinforcement learning algorithms for image categorization: A multi algorithm approach
  • Jul 17, 2024
  • Engineering Applications of Artificial Intelligence
  • Dhanvanth Reddy Yerramreddy + 4 more

Harnessing deep reinforcement learning algorithms for image categorization: A multi algorithm approach

  • Research Article
  • Cite Count Icon 1
  • 10.1080/09544828.2024.2366686
Leveraging deep reinforcement learning for design space exploration with multi-fidelity surrogate model
  • Jun 25, 2024
  • Journal of Engineering Design
  • Haokun Li + 5 more

Design automation is undergoing a new generation of changes caused by artificial intelligence technologies represented by deep learning and reinforcement learning. Notably, the advantages of deep reinforcement learning in addressing solution optimisation and decision-making tasks with cognitive automation functionality have garnered attention in design. In the context of surrogate model-driven engineering design optimisation, this paper addresses current research challenges such as reliance on domain knowledge for local development, shortcomings in the self-learning and adaptive capabilities of optimisation algorithms for global exploration, etc. Centred around the deep reinforcement learning model, Deep Q-learning, and complemented by self-organising maps and neural network technologies, we propose a methodology considering multi-fidelity simulation data for design space exploration. This approach effectively reduces sampling costs and enables the optimisation model to learn the optimal direction for high-precision predictions and achieve rapid, accurate optimisation. Finally, the effectiveness of the proposed method is comprehensively validated through four typical optimisation scenarios and a case study involving the optimisation of a wheeled robot's suspension swing arm structure. This work will be a crucial reference for applying deep reinforcement learning in simulation-driven engineering design optimisation.

  • Research Article
  • Cite Count Icon 22
  • 10.1109/jiot.2022.3168317
Intelligent Fault Quantitative Identification for Industrial Internet of Things (IIoT) via a Novel Deep Dual Reinforcement Learning Model Accompanied With Insufficient Samples
  • Oct 15, 2022
  • IEEE Internet of Things Journal
  • Yuanhong Chang + 5 more

Industrial Internet of Things (IIoT) is mainly a data-oriented network, so intelligent processing of massive data is desiderated to realize the interconnection between machines. Currently, deep-learning-based methods are widely applied for intelligent construction of the IIoT, so as to maximize the self-monitoring and self-management capabilities of various machines. However, the quantity and quality of data and the optimization of parameters greatly limit the properties of such methods. As a breakthrough of artificial intelligence (AI), deep reinforcement learning (DRL) provides inspiration and direction, which combines the advantages of deep learning and reinforcement learning to construct an end-to-end fault identification system. Therefore, a novel deep dual reinforcement learning model was proposed, which consisted of an actor model and a critic model. The dual structures avoid the over-self-optimization of the network. The action model continually learns the knowledge of identifying unknown samples by the <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$\varepsilon $ </tex-math></inline-formula> - <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$greedy$ </tex-math></inline-formula> algorithm, while the critic model dynamically adjusts the policy to guide the action model in right training direction. The effectiveness of the proposed method was verified by three bearing data sets. The results indicate that the proposed method enables agents to independently realize precise fault quantitative identification. The establishment of an experience storage unit overcomes the problem of insufficient samples, which avoids blind trial and error of the proposed mode.

  • Research Article
  • 10.20532/cit.2024.1005825
High-Frequency Quantitative Trading of Digital Currencies Based on Fusion of Deep Reinforcement Learning Models with Evolutionary Strategies
  • Jul 15, 2024
  • Journal of Computing and Information Technology
  • Yijun He + 2 more

High-frequency quantitative trading in the emerging digital currency market poses unique challenges due to the lack of established methods for extracting trading information. This paper proposes a deep evolutionary reinforcement learning (DERL) model that combines deep reinforcement learning with evolutionary strategies to address these challenges. Reinforcement learning is applied to data cleaning and factor extraction from a high-frequency, microscopic viewpoint to quantitatively explain the supply and demand imbalance and to create trading strategies. In order to determine whether the algorithm can successfully extract the significant hidden features in the factors when faced with large and complex high-frequency factors, this paper trains the agent in reinforcement learning using three different learning algorithms, including Q-learning, evolutionary strategies, and policy gradient. The experimental dataset, which contains data on sharp up, sharp down, and continuous oscillation situations, was chosen to test Bitcoin in January-February, September, and November of 2022. According to the experimental results, the evolutionary strategies algorithm achieved returns of 59.18%, 25.14%, and 22.72%, respectively. The results demonstrate that deep reinforcement learning based on the evolutionary strategies outperforms Q-learning and policy gradient concerning risk resistance and return capability. The proposed approach offers a robust and adaptive solution for high-frequency trading in the digital currency market, contributing to the development of effective quantitative trading strategies.

  • PDF Download Icon
  • Research Article
  • Cite Count Icon 28
  • 10.3390/jrfm13040071
Deep Reinforcement Learning in Agent Based Financial Market Simulation
  • Apr 11, 2020
  • Journal of Risk and Financial Management
  • Iwao Maeda + 6 more

Prediction of financial market data with deep learning models has achieved some level of recent success. However, historical financial data suffer from an unknowable state space, limited observations, and the inability to model the impact of your own actions on the market can often be prohibitive when trying to find investment strategies using deep reinforcement learning. One way to overcome these limitations is to augment real market data with agent based artificial market simulation. Artificial market simulations designed to reproduce realistic market features may be used to create unobserved market states, to model the impact of your own investment actions on the market itself, and train models with as much data as necessary. In this study we propose a framework for training deep reinforcement learning models in agent based artificial price-order-book simulations that yield non-trivial policies under diverse conditions with market impact. Our simulations confirm that the proposed deep reinforcement learning model with unique task-specific reward function was able to learn a robust investment strategy with an attractive risk-return profile.

  • Research Article
  • Cite Count Icon 79
  • 10.1016/j.eswa.2019.112872
Time-driven feature-aware jointly deep reinforcement learning for financial signal representation and algorithmic trading
  • Aug 14, 2019
  • Expert Systems with Applications
  • Kai Lei + 4 more

Time-driven feature-aware jointly deep reinforcement learning for financial signal representation and algorithmic trading

  • Conference Article
  • Cite Count Icon 59
  • 10.1109/ei250167.2020.9347147
Explainable AI in Deep Reinforcement Learning Models: A SHAP Method Applied in Power System Emergency Control
  • Oct 30, 2020
  • Ke Zhang + 2 more

The application of artificial intelligence (AI) system is more and more extensive, using the explainable AI (XAI) technology to explain why machine learning (ML) models make certain predictions as important as the accuracy of the predictions, because it ensures the trust and transparency in the model decision-making process. For deep reinforcement learning (DRL) model, although some outstanding progress based on DRL has been made in many fields, it is difficult to explain and cannot be used in safety related occasions. Especially in power system, for the power system emergency control based on DRL, how to provide an intuitive and reliable XAI technology is urgent and necessary. The Shapley additive explanations (SHAP) method has been adopted to provide a reasonable interpretable model for an open-source platform named Reinforcement Learning for Grid Control (RLGC). Through a series of summary plots, force plots and probability of SHAP value, the under-voltage load shedding of power system based on DRL can be interpreted much easier and clearer. More importantly, this work is unique in the power system field, presenting the first use of the SHAP method and the probability of SHAP value to give explanations for emergency control based on DRL in power system.

  • Conference Article
  • Cite Count Icon 3
  • 10.1109/cec.2019.8790001
Deep Multi-agent Reinforcement Learning in a Common-Pool Resource System
  • Jun 1, 2019
  • Hanwei Zhu + 1 more

In complex social-ecological systems, multiple agents with diverse objectives take actions that affect the long-term dynamics of the system. Common pool resources are a subset of such systems, where property rights are typically poorly defined and dynamics are unknown a priori, creating a social dilemma reflected by the well-known ‘tragedy of the commons.’ In this paper, we investigated the efficacy of deep reinforcement learning in a multi-agent setting of a common pool resource system. We used an abstract mathematical model of the system, represented as a partially-observable general-sum Markov game. In the first set of experiments, the independent agents used a deep Q-Network with discrete action spaces to guide decision-making. However, significant shortcomings were evident. Consequently, in a second set of experiments, a Deep Deterministic Policy Gradient learning model with continuous state and action spaces guided agent learning. Simulation results show that agents performed significantly better in terms of both sustainability and economic goals when using the second deep learning model. Despite the fact that agents do not have perfect foresight nor understanding of the implications of their ‘harvesting’ efforts, deep reinforcement learning can be used effectively to ‘learn in the commons.’

  • Conference Article
  • Cite Count Icon 18
  • 10.5220/0007722000520058
An Empirical Research on the Investment Strategy of Stock Market based on Deep Reinforcement Learning model
  • Jan 1, 2019
  • Yuming Li + 2 more

The stock market plays a major role in the entire financial market. How to obtain effective trading signals in the stock market is a topic that stock market has long been discussing. This paper first reviews the Deep Reinforcement Learning theory and model, validates the validity of the model through empirical data, and compares the benefits of the three classical Deep Reinforcement Learning models. From the perspective of the automated stock market investment transaction decision-making mechanism, Deep Reinforcement Learning model has made a useful reference for the construction of investor automation investment model, the construction of stock market investment strategy, the application of artificial intelligence in the field of financial investment and the improvement of investor strategy yield.

  • Research Article
  • Cite Count Icon 25
  • 10.1016/j.oceaneng.2023.116527
Deep reinforcement learning based collision avoidance system for autonomous ships
  • Dec 12, 2023
  • Ocean Engineering
  • Yong Wang + 6 more

Deep reinforcement learning based collision avoidance system for autonomous ships

  • Research Article
  • 10.1158/1538-7445.am2021-184
Abstract 184: The utility of deep metric learning for breast cancer identification on mammographic images
  • Jul 1, 2021
  • Cancer Research
  • Justin Du + 8 more

Purpose: Although deep learning (DL) models have shown increasing ability to accurately classify diagnostic images in oncology, significantly large amounts of well-curated data are often needed to match human level performance. Given the relative paucity of imaging datasets for less prevalent cancer types, there is an increasing need for methods which can improve the performance of deep learning models trained using limited diagnostic images. Deep metric learning (DML) is a potential method which can improve accuracy in deep learning models trained on limited datasets. Leveraging a triplet-loss function, DML exponentially increases training data compared to a traditional DL model. In this study, we investigated the utility of DML to improve the accuracy of DL models trained to classify cancerous lesions found on screening mammograms. Methods: Using a dataset of 2620 lesions found on routine screening mammogram, we trained both a traditional DL and DML models to classify suspicious lesions as cancerous or benign. The VGG16 architecture was used as the basis for the DL and DML models. Model performance was compared by calculating model accuracy, sensitivity, and specificity on a blinded test set of 378 lesions. In addition to individual model performance, we also measured agreement accuracy when both the DL and DML models were combined. Sub-analyses were conducted to identify phenotypes which were best suited for each model type. Both models underwent hyperparameters optimization to identify ideal batch size, learning rate, and regularization to prevent overfitting. Results: We found that the combination of the traditional DL model with DML model resulted in the highest overall accuracy (78.7%) representing a 7.1% improvement compared to the traditional DL model (p&amp;lt;.001). Alone, the traditional DL model had an improved accuracy compared to the DML model (71.4% vs 66.4%). The traditional DL model had a higher sensitivity (94.8% vs 73.6 %) , but lower specificity (34.7% vs 55.1%) compared the DML model. Sub-analyses suggested the traditional DL model was more accurate on higher density breasts, whereas the DML model was more accurate on lower density breasts. Additionally, the traditional DL model had the highest accuracy on oval shaped lesions, compared to the DML model which was most accurate on irregularly shaped breast lesions. Conclusion: Our study suggests that addition of DML models with traditional DL models can improve diagnostic image classification performance in cancer. Our results suggest DML models may provide increased specificity and help with classification of unique populations often misclassified by traditional DL models. Further studied investigating the utility of DML on other cancer imaging tasks are necessary to successfully build more robust DL models in cancer imaging. Citation Format: Justin Du, Sachin Umrao, Enoch Chang, Marina Joel, Aidan Gilson, Guneet Janda, Rachel Choi, Yongfeng Hui, Sanjay Aneja. The utility of deep metric learning for breast cancer identification on mammographic images [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2021; 2021 Apr 10-15 and May 17-21. Philadelphia (PA): AACR; Cancer Res 2021;81(13_Suppl):Abstract nr 184.

Save Icon
Up Arrow
Open/Close
  • Ask R Discovery Star icon
  • Chat PDF Star icon

AI summaries and top papers from 250M+ research sources.