Abstract

In this paper, we propose an extensible framework for model-free reinforcement learning (RL) for real-time bidding (RTB) in display advertising. This framework can be applied into both simple environments and extend to the comprehensive environment that the DSP bids for multiple advertisers at the same time. To process new information that is collected via real-time interaction with the environment, an extensible model is first introduced, which is based on the distribution of the recharging probability. Substantial effort is expended to alleviate the problem of the sparsity of the click signal with the reward function. The proposed scheme has high feasibility and can address dynamic environments in contrast to prior works, which assumed that the distribution of the feature vectors and the dealing price were already known. Furthermore, a fund-recharging mechanism is introduced for transforming the RTB model into an endless task, which allows the policy to be optimized in a farsighted rather than a myopic manner. Illustrative experiments on both the small- and large-scale real datasets demonstrate the state-of-the-art performance of the proposed framework for the issue of interest.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call