Natural Actor-Critic Algorithm Research Articles

Actor-critic style two-time-scale algorithms are one of the most popular methods in reinforcement learning, and have seen great empirical success. However, their performance is not completely understood theoretically. In this paper, we characterize the <i>global</i> convergence of an online natural actor-critic algorithm in the tabular setting using a single trajectory of samples. Our analysis applies to very general settings, as we only assume ergodicity of the underlying Markov decision process. In order to ensure enough exploration, we employ an <inline-formula><tex-math notation="LaTeX">$\epsilon$</tex-math></inline-formula>-greedy sampling of the trajectory. For a fixed and small enough exploration parameter <inline-formula><tex-math notation="LaTeX">$\epsilon$</tex-math></inline-formula>, we show that the two-time-scale natural actor-critic algorithm has a rate of convergence of <inline-formula><tex-math notation="LaTeX">$\tilde{\mathcal {O}}(1/T^{1/4})$</tex-math></inline-formula>, where <inline-formula><tex-math notation="LaTeX">$T$</tex-math></inline-formula> is the number of samples, and this leads to a sample complexity of <inline-formula><tex-math notation="LaTeX">$\tilde{\mathcal {O}}(1/\delta ^{8})$</tex-math></inline-formula> samples to find a policy that is within an error of <inline-formula><tex-math notation="LaTeX">$\delta$</tex-math></inline-formula> from the <i>global optimum</i>. Moreover, by carefully decreasing the exploration parameter <inline-formula><tex-math notation="LaTeX">$\epsilon$</tex-math></inline-formula> as the iterations proceed, we present an improved sample complexity of <inline-formula><tex-math notation="LaTeX">$\tilde{\mathcal {O}}(1/\delta ^{6})$</tex-math></inline-formula> for convergence to the global optimum.

Read full abstract

Tunnel ventilation systems provide drivers with a comfortable and safe driving environment by generating sufficient airflow and by diluting the concentration of noxious contaminants below an acceptable level. For that purpose, tunnel ventilation systems contain mechanical equipment such as jet-fans, blowers and dust collectors. These machines consume large amount of energy, therefore, it is necessary to have an efficient operating algorithm for tunnel ventilation in terms of energy savings and safe driving. In this paper, a new reinforcement learning (RL) method is applied as the control algorithm. In the process of formulating the reward of the tunnel ventilation system, which is a performance index to be maximized in the RL methodology, the following two objectives are of great interest: maintaining an adequate level of pollutants and minimizing power consumption. The RL control algorithm adopted in this research is based on an actor-critic architecture and natural gradient method. Due to its ability to achieve the truly steepest direction of gradients, the natural gradient method can be a promising route to improving the efficacy of the actor module. Also, the recursive least-squares (RLS) method is employed to the critic module in order to improve the efficiency by which data is used. Using actual data collected from an existing tunnel ventilation system, extensive simulation studies were performed. It was confirmed that the suggested algorithm achieved the desired control goals and, when compared to previously developed RL-based control algorithms, improved the performance considerably.

Read full abstract

Natural Actor-Critic Algorithm Research Articles

Related Topics

Articles published on Natural Actor-Critic Algorithm

Finite-Sample Analysis of Two-Time-Scale Natural Actor–Critic Algorithm

A Two-Timescale Stochastic Algorithm Framework for Bilevel Optimization: Complexity Analysis and Application to Actor-Critic

Robot Manipulation Skills Transfer for Sim-to-Real in Unstructured Environments

Finite-Sample Analysis of Off-Policy Natural Actor–Critic With Linear Function Approximation

Policy oscillation is overshooting

A Survey of Actor-Critic Reinforcement Learning: Standard and Natural Policy Gradients

Tunnel ventilation controller design using an RLS-based natural actor-critic algorithm

Natural actor–critic algorithms

Impedance Learning for Robotic Contact Tasks Using Natural Actor-Critic Algorithm

Natural Actor-Critic

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Natural Actor-Critic Algorithm Research Articles

Related Topics

Articles published on Natural Actor-Critic Algorithm

Finite-Sample Analysis of Two-Time-Scale Natural Actor–Critic Algorithm

A Two-Timescale Stochastic Algorithm Framework for Bilevel Optimization: Complexity Analysis and Application to Actor-Critic

Robot Manipulation Skills Transfer for Sim-to-Real in Unstructured Environments

Finite-Sample Analysis of Off-Policy Natural Actor–Critic With Linear Function Approximation

Policy oscillation is overshooting

A Survey of Actor-Critic Reinforcement Learning: Standard and Natural Policy Gradients

Tunnel ventilation controller design using an RLS-based natural actor-critic algorithm

Natural actor–critic algorithms

Impedance Learning for Robotic Contact Tasks Using Natural Actor-Critic Algorithm

Natural Actor-Critic