Using Wasserstein Generative Adversarial Networks for the design of Monte Carlo simulations

Susan Athey,Guido W Imbens,Jonas Metzger,Evan Munro

doi:10.1016/j.jeconom.2020.09.013

Susan Athey, Guido W Imbens + Show 2 more

Open Access

https://doi.org/10.1016/j.jeconom.2020.09.013

Copy DOI

Journal: Journal of Econometrics	Publication Date: Mar 20, 2021
Citations: 19

Affiliation: Stanford University

Abstract

When researchers develop new econometric methods it is common practice to compare the performance of the new methods to those of existing methods in Monte Carlo studies. The credibility of such Monte Carlo studies is often limited because of the discretion the researcher has in choosing the Monte Carlo designs reported. To improve the credibility we propose using a class of generative models that has recently been developed in the machine learning literature, termed Generative Adversarial Networks (GANs) which can be used to systematically generate artificial data that closely mimics existing datasets. Thus, in combination with existing real data sets, GANs can be used to limit the degrees of freedom in Monte Carlo study designs for the researcher, making any comparisons more convincing. In addition, if an applied researcher is concerned with the performance of a particular statistical method on a specific data set (beyond its theoretical properties in large samples), she can use such GANs to assess the performance of the proposed method, e.g. the coverage rate of confidence intervals or the bias of the estimator, using simulated data which closely resembles the exact setting of interest. To illustrate these methods we apply Wasserstein GANs (WGANs) to the estimation of average treatment effects. In this example, we find that (i) there is not a single estimator that outperforms the others in all three settings, so researchers should tailor their analytic approach to a given setting, (ii) systematic simulation studies can be helpful for selecting among competing methods in this situation, and (iii) the generated data closely resemble the actual data.

Full Text