Temporal Consistency-Based Loss Function for Both Deep Q-Networks and Deep Deterministic Policy Gradients for Continuous Actions

Chayoung Kim

doi:10.3390/sym13122411

Chayoung Kim

Open Access

PDF Available

https://doi.org/10.3390/sym13122411

Copy DOI

Export

Save

Cite

Journal: Symmetry	Publication Date: Dec 13, 2021
Citations: 3	License type: CC BY 4.0

Affiliation: Kyonggi University

Abstract
Highlights/Summary
Full-Text PDF
Similar Papers

Abstract

Listen

Artificial intelligence (AI) techniques in power grid control and energy management in building automation require both deep Q-networks (DQNs) and deep deterministic policy gradients (DDPGs) in deep reinforcement learning (DRL) as off-policy algorithms. Most studies on improving the stability of DRL have addressed these with replay buffers and a target network using a delayed temporal difference (TD) backup, which is known for minimizing a loss function at every iteration. The loss functions were developed for DQN and DDPG, and it is well-known that there have been few studies on improving the techniques of the loss functions used in both DQN and DDPG. Therefore, we modified the loss function based on a temporal consistency (TC) loss and adapted the proposed TC loss function for the target network update in both DQN and DDPG. The proposed TC loss function showed effective results, particularly in a critic network in DDPG. In this work, we demonstrate that, in OpenAI Gym, both “cart-pole” and “pendulum”, the proposed TC loss function shows enormously improved convergence speed and performance, particularly in the critic network in DDPG.

Highlights

Promising outputs have been accomplished in the field of deep reinforcement learning (DRL) that combines reinforcement learning (RL) [1] and deep learning (DL) [2]
We believe that the proposed temporal consistency (TC)-deep Q-networks (DQNs) and TC-deep deterministic policy gradients (DDPGs) could be useful in applications such as autonomous voltage control in power grid control and load shifting in a cooling supply system [10,11]
We proposed a novel TC loss function based on a previously developed TC loss and adapted the proposed TC loss function for target network updates for both DQN and DDPG, for a critic network

Summary

Introduction

Promising outputs have been accomplished in the field of deep reinforcement learning (DRL) that combines reinforcement learning (RL) [1] and deep learning (DL) [2]. With RL, we developed a framework for a behavioral policy that maximizes values regarding the control of unknown complex environments. In DRL, we applied DL using a deep neural network (DNN) as an approximation function for RL. DRL has achieved optimization applications, such as the games of Go and Alpha Go [3], which is one of the most incredible works. There are two well-developed representatives of model-free and off-policies in DRL: deep Q-network (DQN) [4,5] for discrete environments and deep deterministic policy gradient (DDPG) [6] for continuous action spaces

Results

Discussion

Conclusion