A Dynamic Bidding Strategy Based on Model-Free Reinforcement Learning in Display Advertising

Mengjuan Liu,Xuyun Nie,Jinyu Liu,Li Jiaxing,Zhengning Hu

doi:10.1109/access.2020.3037940

Abstract

Real-time bidding (RTB) is one of the most striking advances in online advertising, where the websites can sell each ad impression through a public auction, and the advertisers can participate in bidding the impression based on its estimated value. In RTB, the bidding strategy is an essential component for advertisers to maximize their revenues (e.g., clicks and conversions). However, most existing bidding strategies may not work well when the RTB environment changes dramatically between the historical and the new ad delivery periods since they regard the bidding decision as $\boldsymbol {a}$ static optimization problem and derive the bidding function only based on historical data. Thus, the latest research suggests using the reinforcement learning (RL) framework to learn the optimal bidding strategy suitable for the highly dynamic RTB environment. In this paper, we focus on using model-free reinforcement learning to optimize the bidding strategy. Specifically, we divide an ad delivery period into several time slots. The bidding agent decides each impression's bidding price depending on its estimated value and the bidding factor of its arriving time slot. Therefore, the bidding strategy is simplified to solve each time slot's optimal bidding factor, which can adapt dynamically to the RTB environment. We exploit the Twin Delayed Deep Deterministic policy gradient (TD3) algorithm to learn each time slot's optimal bidding factor. Finally, the empirical study on a public dataset demonstrates the superior performance and high efficiency of the proposed bidding strategy compared with other state-of-the-art baselines.

Highlights

In recent years, online advertising has generated a multibillion-dollar market share [1]
When a user visits a web page, the script for the ad slot embedded on the page will initiate a bid request for the ad impression to the ad exchange (ADX)
The authors of [15] proposed using a model-free reinforcement learning (RL) framework to learn the bidding strategy, called Deep Reinforcement Learning to Bid (DRLB)

Summary

INTRODUCTION

Online advertising has generated a multibillion-dollar market share [1]. This formula means that the bidding price equals to the click value multiplied by the impression’s predicted CTR This optimal bidding strategy may not be right in RTB since the auction results depend on the market competition, auction volume, and campaign budget [10]. The authors of [15] proposed using a model-free RL framework to learn the bidding strategy, called Deep Reinforcement Learning to Bid (DRLB) They redefined the bidding function as formula (2), where each impression’s bidding price depends on its estimated value and the bidding factor of the time slot that the impression is arriving. To guide the TD3 algorithm toward the optimal bidding factor generation policy efficiently, we design two reward functions based on the cost and user feedback of a time slot.

RELATED WORK

BASELINE BIDDING STRATEGIES

VIII. CONCLUSION