Surrogate Objectives Research Articles

Trust region policy optimization (TRPO) is one of the landmark policy optimization algorithms in deep reinforcement learning. Its purpose is to maximize a surrogate objective based on an advantage function, subject to the limited Kullback–Leibler (KL) divergence of two consecutive policies. Although there have been many successful applications of this algorithm in the literature, the approach has often been criticized for suppressing the exploration ability of some application environments due to its strict divergence constraint. As such, most researchers prefer to use entropy regularization, which is added to the expected discounted rewards or the surrogate objectives. That said, there is much debate about whether there might be an alternative strategy for regularizing TRPOs. In this paper, we present just that. Our strategy is to regularize the KL divergence-based constraint via Shannon entropy. This approach enlarges the difference between two consecutive policies and thus derives a new TRPO scheme with entropy regularization for use with KL divergence constraint. Next, the surrogate objective and Shannon entropy are approximated linearly, while the KL divergence is expanded quadratically. An efficient conjugate gradient optimization procedure then solves two sets of linear equations, providing a detailed code-level implementation that can be used for a fair experimental comparison. Extensive experiments within eight benchmark environments demonstrate that our proposed method is superior to both the original TRPO and the entropy regularized objective TRPO. Further, theoretical and experimental analysis shows that three TRPO-like methods have an equal time complexity and a close computational burden.

Read full abstract

In this paper we use classical and behavioral game theory to predict coalition behavior in a sequential three-person laboratory game. In the laboratory situation used for the three studies reported in this paper, each player seeks to maximize the rank of his final accumulated point score in relation to the total scores of the other members of his triad. We assume that, unable to solve the sequential game as a single supergame, each player forms a simplified representation of the situation by adopting short-run, surrogate objectives for the outcome of each game in the sequence. In particular, we assume that each player seeks to maximize the status of his current total score during each game in the sequence. Given these surrogate objectives, we represent each trial in the sequential game as a distinct game of status in characteristic set function form. Using this formal model of the situation, we extend a solution concept from n-person game theory to identify which of the possible outcomes of each game in the sequence are, in a game theoretic sense, stable. We then add the assumption that each player obeys a bargaining maxim prescribing that coalition members should agree to an outcome in which one partner outranks the other if and only if he also enjoys higher rank when the coalition forms. We modify the game theoretic solution by considering only outcomes that conform to the bargaining maxim. The behavioral modification substantially improves the success game theory achieves in predicting rank outcomes observed in the three laboratory studies. However, with or without the modification, the game theoretic solution concept fails to predict which coalitions form in these studies. Finally, we develop an alternative theory that predicts not only the rank outcomes of each game in the sequence but also the probability that each coalition forms. The empirical success enjoyed by this theory in predicting the ranks acquired on each game is greater than that attained by the unmodified game theoretic predictions, and is nearly comparable to that achieved by the behaviorally modified game theoretic solution. Moreover, the alternative theory's robust coalition predictions are strongly supported by the results from the three laboratory studies.

Read full abstract

Surrogate Objectives Research Articles

Articles published on Surrogate Objectives

Trust region policy optimization via entropy regularization for Kullback–Leibler divergence constraint

Empirical bias-reducing adjustments to estimating functions

Modeling of an unseeded reactive crystallization process using multiobjective optimization

Structured Output Learning with Conditional Generative Flows

Optimization Approaches for the Numerical Design of Structures Under Consideration of Polymorphic Uncertain Data

Surrogate measures to optimize structures for robust and predictable progressive failure

A comparative study of single-phase and two-phase approaches for the layout problem with material handling costs

Finding Optimal Algorithmic Parameters Using Derivative‐Free Optimization

A study on surrogate objectives for loading a certain type of flexible manufacturing systems

Scheduling Players in Team Competitions: Theory and Computational Results

Sequential games of status

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Surrogate Objectives Research Articles

Articles published on Surrogate Objectives

Trust region policy optimization via entropy regularization for Kullback–Leibler divergence constraint

Empirical bias-reducing adjustments to estimating functions

Modeling of an unseeded reactive crystallization process using multiobjective optimization

Structured Output Learning with Conditional Generative Flows

Optimization Approaches for the Numerical Design of Structures Under Consideration of Polymorphic Uncertain Data

Surrogate measures to optimize structures for robust and predictable progressive failure

A comparative study of single-phase and two-phase approaches for the layout problem with material handling costs

Finding Optimal Algorithmic Parameters Using Derivative‐Free Optimization

A study on surrogate objectives for loading a certain type of flexible manufacturing systems

Scheduling Players in Team Competitions: Theory and Computational Results

Sequential games of status