Optimal Stationary Policy Research Articles

Problem definition: Managers in ad agencies are responsible for delivering digital ads to viewers on behalf of advertisers, subject to the terms specified in the ad campaigns. They need to develop bidding policies to obtain viewers on an ad exchange and allocate them to the campaigns to maximize the agency’s profits, subject to the goals of the ad campaigns. Academic/practical relevance: Determining a rigorous solution methodology is complicated by uncertainties in the arrival rates of viewers and campaigns, as well as uncertainty in the outcomes of bids on the ad exchange. In practice, ad hoc strategies are often deployed. Our methodology jointly determines optimal bidding and viewer-allocation strategies and obtains insights about the characteristics of the optimal policies. Methodology: New ad campaigns and viewers are treated as Poisson arrivals, and the resulting model is a Markov decision process, where the state of the system is the number of undelivered impressions in queue for each campaign type in each period. We develop solution methods for bid optimization and viewer allocation and perform a sensitivity analysis with respect to the key problem parameters. Results: We solve for the optimal dynamic, state-dependent bidding and allocation policies as a function of the number of ad impressions in queue, for both the finite horizon and steady-state cases. We show that the resulting optimization problems are strictly concave in the decision variables and develop and evaluate a heuristic method that can be applied to large problems. Managerial implications: Numerical analysis of our heuristic solution shows that its errors are generally small and that the optimal dynamic, state-dependent bidding policies obtained by our model are significantly better than optimal static policies. Our proposed approach is managerially attractive because it is easy to implement in practice. We identify the capacity of the impression queue as an important managerial control lever and show that it can be more effective than using higher bids to reduce delay penalties. We quantify potential operational benefits from the consolidation of ad campaigns, as well as merging ad exchanges. Supplemental Material: The online appendix is available at https://doi.org/10.1287/msom.2022.1142 .

Read full abstract

We consider a liquidation problem in which a risk-averse trader tries to liquidate a fixed quantity of an asset in the presence of market impact and random price fluctuations. The trader encounters a trade-off between the transaction costs incurred due to market impact and the volatility risk of holding the position. Our formulation begins with a continuous-time and infinite horizon variation of the seminal model of Almgren and Chriss (2000), but we define as the objective the conditional value-at-risk (CVaR) of the implementation shortfall, and allow for dynamic (adaptive) trading strategies. In this setting, we are able to derive closed-form expressions for the optimal liquidation strategy and its value function. Our results yield a number of important practical insights. We are able to quantify the benefit of adaptive policies over optimized static policies. The relevant improvement depends only on the level of risk aversion: for moderate levels of risk aversion, the optimal dynamic policy outperforms the optimal static policy by 5-15%, and outperforms the optimal volume weighted average price (VWAP) policy by 15-25%. This improvement is achieved through dynamic policies that exhibit "aggressiveness-in-the-money": trading is accelerated when price movements are favorable, and is slowed when price movements are unfavorable. From a mathematical perspective, our analysis exploits the dual representation of CVaR to convert the problem to a continuous-time, zero-sum game. We leverage the idea of the state-space augmentation, and obtain a partial differential equation describing the optimal value function, which is separable and a special instance of the Emden-Fowler equation. This leads to a closed-form solution. As our problem is a special case of a linear-quadratic-Gaussian control problem with a CVaR objective, these results may be interesting in broader settings.

Read full abstract

Optimal Stationary Policy Research Articles

Related Topics

Articles published on Optimal Stationary Policy

Scalable reinforcement learning approaches for dynamic pricing in ride-hailing systems

Optimal transmission strategy for multiple Markovian fading channels: Existence, structure, and approximation

Maintenance optimization of a two‐component series system considering masked causes of failure

Adaptive discounted control for piecewise deterministic Markov processes

Optimal Sensor Scheduling Under Intermittent Observations Subject to Network Dynamics

A note on the existence of optimal stationary policies for average Markov decision processes with countable states

Risk-Sensitive Average Optimality for Discrete-Time Markov Decision Processes

A Markov Decision Model for Managing Display-Advertising Campaigns

Preventive replacement policy of a system considering multiple maintenance actions upon a failure

Transmission power allocation for remote estimation with multi-packet reception capabilities

Sleep, Sense or Transmit: Energy-Age Tradeoff for Status Update With Two-Threshold Optimal Policy

Risk-Sensitive Optimal Execution via a Conditional Value-at-Risk Objective

Asymptotic Optimality and Rates of Convergence of Quantized Stationary Policies in Continuous‐Time Markov Decision Processes

Managing a Hybrid RDC‐DC Inventory System

Risk-sensitive average continuous-time Markov decision processes with unbounded transition and cost rates

On gradual-impulse control of continuous-time Markov decision processes with exponential utility

Joint optimization of preventive maintenance and production scheduling for multi-state production systems based on reinforcement learning

Integro-differential optimality equations for the risk-sensitive control of piecewise deterministic Markov processes

Discounted Markov Decision Processes with Constrained Costs: the decomposition approach

Asymptotic Evaluations of the Stability Index for a Markov Control Process with the Expected Total Discounted Reward Criterion

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Optimal Stationary Policy Research Articles

Related Topics

Articles published on Optimal Stationary Policy

Scalable reinforcement learning approaches for dynamic pricing in ride-hailing systems

Optimal transmission strategy for multiple Markovian fading channels: Existence, structure, and approximation

Maintenance optimization of a two‐component series system considering masked causes of failure

Adaptive discounted control for piecewise deterministic Markov processes

Optimal Sensor Scheduling Under Intermittent Observations Subject to Network Dynamics

A note on the existence of optimal stationary policies for average Markov decision processes with countable states

Risk-Sensitive Average Optimality for Discrete-Time Markov Decision Processes

A Markov Decision Model for Managing Display-Advertising Campaigns

Preventive replacement policy of a system considering multiple maintenance actions upon a failure

Transmission power allocation for remote estimation with multi-packet reception capabilities

Sleep, Sense or Transmit: Energy-Age Tradeoff for Status Update With Two-Threshold Optimal Policy

Risk-Sensitive Optimal Execution via a Conditional Value-at-Risk Objective

Asymptotic Optimality and Rates of Convergence of Quantized Stationary Policies in Continuous‐Time Markov Decision Processes

Managing a Hybrid RDC‐DC Inventory System

Risk-sensitive average continuous-time Markov decision processes with unbounded transition and cost rates

On gradual-impulse control of continuous-time Markov decision processes with exponential utility

Joint optimization of preventive maintenance and production scheduling for multi-state production systems based on reinforcement learning

Integro-differential optimality equations for the risk-sensitive control of piecewise deterministic Markov processes

Discounted Markov Decision Processes with Constrained Costs: the decomposition approach

Asymptotic Evaluations of the Stability Index for a Markov Control Process with the Expected Total Discounted Reward Criterion