Abstract

Sinkhorn divergence is a symmetric normalization of entropic regularized optimal transport. It is a smooth and continuous metrized weak-convergence with excellent geometric properties. We use it as an alternative for the minimax objective function in formulating generative adversarial networks. The optimization is defined with Sinkhorn divergence as the objective, under the non-convex and non-concave condition. This work focuses on the optimization’s convergence and stability. We propose a first order sequential stochastic gradient descent ascent (SeqSGDA) algorithm. Under some mild approximations, the learning converges to local minimax points. Using the structural similarity index measure (SSIM), we supply a non-asymptotic analysis of the algorithm’s convergence rate. Empirical evidences show a convergence rate, which is inversely proportional to the number of iterations, when tested on tiny colour datasets Cats and CelebA on the deep convolutional generative adversarial networks and ResNet neural architectures. The entropy regularization parameter $\varepsilon $ is approximated to the SSIM tolerance $\epsilon $ . We determine that the iteration complexity to return to an $\epsilon $ -stationary point to be $\mathcal {O}\left ({\kappa \, \log (\epsilon ^{-1})}\right)$ , where $\kappa $ is a value that depends on the Sinkhorn divergence’s convexity and the minimax step ratio in the SeqSGDA algorithm.

Highlights

  • An alternative to learning the generative adversarial networks (GANs) [1] comes from the theory of optimal transport (OT) [2], [3]

  • We show that the sequential stochastic gradient descent ascent (SeqSGDA) algorithm solves the minimax optimization for both convex-concave, that is for p ∈ {1, 2}, and the NCNC, that is for p > 2, objectives

  • Note that the Fréchet inception distance (FID) score captures the similarity of the generated samples to the real samples better than inception score (IS) does

Read more

Summary

INTRODUCTION

An alternative to learning the generative adversarial networks (GANs) [1] comes from the theory of optimal transport (OT) [2], [3]. 1) DIVERGENCE FORMULATION Two initial models that work in large scale GANs are SGD-AutoDiff [5] and OTGAN [7] The former uses Scεφ (μθ , ν) as the minimax objective and a maximum mean discrepancy-like 2-Wasserstein ground cost. Unlike SGD-AutoDiff, instead of using the same objective for minimization and maximization, we seek to minimize Wcεφ (μθ , ν) and maximize Scεφ (μθ , ν) with an equal gradient step size for both Gθ and Dφ With this modification, we show that the SeqSGDA algorithm solves the minimax optimization for both convex-concave, that is for p ∈ {1, 2}, and the NCNC, that is for p > 2, objectives. To enforce a Lipschitz constraint, we implement a spectral normalization [25] on Dφ, in place of a simple weight clipping

A BRIEF REVIEW OF OPTIMAL TRANSPORT
EOT DUAL FORMULATION
EOT SEMI-DUAL FORMULATION
SEQUENTIAL GAME
SEQUENTIAL SGDA ALGORITHM
EXPERIMENTS AND RESULTS
NCNC GROUND COSTS
SeqSGDA CONVERGENCE RATE
INSIGHTS AND CONCLUSION
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call