Averaged Soft Actor-Critic for Deep Reinforcement Learning

Feng Ding,Zhikui Chen,Peng Li,Guanfeng Ma,Jing Gao,Ning Cai

doi:10.1155/2021/6658724

Abstract

With the advent of the era of artificial intelligence, deep reinforcement learning (DRL) has achieved unprecedented success in high-dimensional and large-scale artificial intelligence tasks. However, the insecurity and instability of the DRL algorithm have an important impact on its performance. The Soft Actor-Critic (SAC) algorithm uses advanced functions to update the policy and value network to alleviate some of these problems. However, SAC still has some problems. In order to reduce the error caused by the overestimation of SAC, we propose a new SAC algorithm called Averaged-SAC. By averaging the previously learned action-state estimates, it reduces the overestimation problem of soft Q-learning, thereby contributing to a more stable training process and improving performance. We evaluate the performance of Averaged-SAC through some games in the MuJoCo environment. The experimental results show that the Averaged-SAC algorithm effectively improves the performance of the SAC algorithm and the stability of the training process.

Highlights

To generate fully autonomous agents which can learn to automate behaviors by interacting with the experimental environment via trials and errors is one of the most important tasks in the current artificial intelligence field
It shows that the Averaged-Soft Actor-Critic (SAC) of all 6 games of MuJoCo is better than SAC
It can be seen from the figure that the Averaged-SAC loss values of all the 6 games of MuJoCo are all lower than those of the SAC during the training process, which indicates that the proposed Averaged-SAC model exactly reduces the error caused by overestimation and greatly helps the soft Q-learning

Summary

Introduction

To generate fully autonomous agents which can learn to automate behaviors by interacting with the experimental environment via trials and errors is one of the most important tasks in the current artificial intelligence field. In the current artificial intelligence field, the long-term challenge is to create an intelligence system responding to environments in a timely manner Such intelligence systems include robots that can interact with the surrounding environments and software-based agents that can interact with multimedia devices. Deep reinforcement learning (DRL) whose mathematical framework is the experience-driven autonomous learning is the most important algorithm to address these challenges [1]. Erefore, how to improve the security and stability of DRL algorithms for a large number of artificial intelligence learning is one of the most challenging problems in DRL today. As for policy gradient method, Actor-Critic (AC) algorithm is an important example. It is usually classified as a PG method [16] in some places

Objectives

Results

Conclusion