Variance Reduction in Actor Critic Methods (ACM)

Eric Benhamou

doi:10.2139/ssrn.3424668

Variance Reduction in Actor Critic Methods (ACM)

Eric Benhamou

Open Access

https://doi.org/10.2139/ssrn.3424668

Copy DOI

Journal: SSRN	Publication Date: Jan 1, 2019
Citations: 1

Affiliation: Université Paris Dauphine-PSL, Alpha-1 Foundation

#Actor Critic Methods #Advantage Actor Critic + Show 8 more

Abstract
Full-Text PDF
Similar Papers

Abstract

After presenting Actor Critic Methods (ACM), we show ACM are control variate estimators. Using the projection theorem, we prove that the Q and Advantage Actor Critic (A2C) methods are optimal in the sense of the $L^2$ norm for the control variate estimators spanned by functions conditioned by the current state and action. This straightforward application of Pythagoras theorem provides a theoretical justification of the strong performance of QAC and AAC most often referred to as A2C methods in deep policy gradient methods. This enables us to derive a new formulation for Advantage Actor Critic methods that has lower variance and improves the traditional A2C method.

Full Text