Abstract

We present a differentially private actor and its eligibility trace in an actor-critic approach, wherein an actor takes actions directly interacting with an environment; however, the critic estimates only the state values that are obtained through bootstrapping. In other words, the actor reflects the more detailed information about the sequence of taken actions on its parameter than the critic. Moreover, their corresponding eligibility traces have the same properties. Therefore, it is necessary to preserve the privacy of an actor and its eligibility trace while training on private or sensitive data. In this paper, we confirm the applicability of differential privacy methods to the actors updated using the policy gradient algorithm and discuss the advantages of such an approach with regard to differentially private critic learning. In addition, we measured the cosine similarity between the differentially private applied eligibility trace and the non-differentially private eligibility trace to analyze whether their anonymity is appropriately protected in the differentially private actor or the critic. We conducted the experiments considering two synthetic examples imitating real-world problems in medical and autonomous navigation domains, and the results confirmed the feasibility of the proposed method.

Highlights

  • Reinforcement learning (RL) defines the steps and procedures required to map situations to actions aiming to maximize a accumulated reward signal [1] and serves as a practical framework for decision-making problems

  • We propose a method to protect the privacy of sensitive data corresponding to an actor and its eligibility trace during training in the actor-critic approach

  • We measured the anonymity of the eligibility trace vectors when Difference privacy (DP) was applied by cosine similarity

Read more

Summary

Introduction

Reinforcement learning (RL) defines the steps and procedures required to map situations to actions aiming to maximize a accumulated reward signal [1] and serves as a practical framework for decision-making problems. With the development and deployment of diverse RL-based technologies in computer science, the demand for these private or sensitive data increases. Rather than using these raw data as it is, it is needed to prevent personal privacy leakage while maintaining the original data’s utility. To approximate deterministic real-valued function f : D → R with a differential privacy mechanism, we incorporate additive noise calibrated to the sensitivity of f that is defined as the maximum of the absolute distance | f (d) − f (d0 )|, where d, d’ are adjacent input data sets. Selecting between the widely used Gaussian and Laplace noise mechanisms [2,8,16], in the present study, we employ the Gaussian mechanisms defined as follows:

Objectives
Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call