Differentially Private Actor and Its Eligibility Trace

Kanghyeon Seo,Jihoon Yang

doi:10.3390/electronics9091486

Abstract

We present a differentially private actor and its eligibility trace in an actor-critic approach, wherein an actor takes actions directly interacting with an environment; however, the critic estimates only the state values that are obtained through bootstrapping. In other words, the actor reflects the more detailed information about the sequence of taken actions on its parameter than the critic. Moreover, their corresponding eligibility traces have the same properties. Therefore, it is necessary to preserve the privacy of an actor and its eligibility trace while training on private or sensitive data. In this paper, we confirm the applicability of differential privacy methods to the actors updated using the policy gradient algorithm and discuss the advantages of such an approach with regard to differentially private critic learning. In addition, we measured the cosine similarity between the differentially private applied eligibility trace and the non-differentially private eligibility trace to analyze whether their anonymity is appropriately protected in the differentially private actor or the critic. We conducted the experiments considering two synthetic examples imitating real-world problems in medical and autonomous navigation domains, and the results confirmed the feasibility of the proposed method.

Highlights

Reinforcement learning (RL) defines the steps and procedures required to map situations to actions aiming to maximize a accumulated reward signal [1] and serves as a practical framework for decision-making problems
We propose a method to protect the privacy of sensitive data corresponding to an actor and its eligibility trace during training in the actor-critic approach
We measured the anonymity of the eligibility trace vectors when Difference privacy (DP) was applied by cosine similarity

Summary

Introduction

Reinforcement learning (RL) defines the steps and procedures required to map situations to actions aiming to maximize a accumulated reward signal [1] and serves as a practical framework for decision-making problems. With the development and deployment of diverse RL-based technologies in computer science, the demand for these private or sensitive data increases. Rather than using these raw data as it is, it is needed to prevent personal privacy leakage while maintaining the original data’s utility. To approximate deterministic real-valued function f : D → R with a differential privacy mechanism, we incorporate additive noise calibrated to the sensitivity of f that is defined as the maximum of the absolute distance | f (d) − f (d0 )|, where d, d’ are adjacent input data sets. Selecting between the widely used Gaussian and Laplace noise mechanisms [2,8,16], in the present study, we employ the Gaussian mechanisms defined as follows:

Objectives

Methods

Results

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Electronics	Publication Date: Sep 10, 2020
Citations: 3	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Differentially Private Actor and Its Eligibility Trace

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Electronics

Lead the way for us

Similar Papers

A Cooperation Online Reinforcement Learning Approach in Ant-Q
Seunggwan Lee
-
Seunggwan LeeSeunggwan Lee
01 Jan 2006
01 Jan 2006

Experimental analysis on Sarsa(λ) and Q(λ) with different eligibility traces strategies
Jinsong Leng ... Lakhmi C Jain
Journal of Intelligent & Fuzzy Systems | VOL. 20
Jinsong Leng, et. al.Jinsong Leng ... Lakhmi C Jain
01 Jan 2009
Journal of Intelligent & Fuzzy Systems | VOL. 20

Enhanced Temporal Difference Learning Using Compiled Eligibility Traces
Peter Vamplew ... Mark Hepburn
-
Peter Vamplew, et. al.Peter Vamplew ... Mark Hepburn
01 Jan 2006
01 Jan 2006

Experimental analysis of eligibility traces strategies in temporal difference learning
Jinsong Leng ... Lakhmi Jain
International Journal of Knowledge Engineering and Soft Data Paradigms | VOL. 1
Jinsong Leng, et. al.Jinsong Leng ... Lakhmi Jain
01 Jan 2009
International Journal of Knowledge Engineering and Soft Data Paradigms | VOL. 1

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Differentially Private Actor and Its Eligibility Trace

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Electronics