A Reward Shaping Approach for Reserve Price Optimization using Deep Reinforcement Learning

Reza Refaei Afshar,Yingqian Zhang,Jason Rhuggenaath,Uzay Kaymak

doi:10.1109/ijcnn52387.2021.9533817

Abstract

Real Time Bidding is the process of selling and buying online advertisements in real time auctions. Real time auctions are performed in header bidding partners or ad exchanges to sell publishers' ad placements. Ad exchanges run second price auctions and a reserve price should be set for each ad placement or impression. This reserve price is normally determined by the bids of header bidding partners. However, ad exchange may outbid higher reserve prices and optimizing this value largely affects the revenue. In this paper, we propose a deep reinforcement learning approach for adjusting the reserve price of individual impressions using contextual information. Normally, ad exchanges do not return any information about the auction except the sold-unsold status. This binary feedback is not suitable for maximizing the revenue because it contains no explicit information about the revenue. In order to enrich the reward function, we develop a novel reward shaping approach to provide informative reward signal for the reinforcement learning agent. Based on this approach, different intervals of reserve price get different weights and the reward value of each interval is learned through a search procedure. Using a simulator, we test our method on a set of impressions. Results show superior performance of our proposed method in terms of revenue compared with the baselines.

Full Text