Abstract

Real-time bidding (RTB) has become a major paradigm of display advertising. Each ad impression generated from a user visit is auctioned in real time, where demand-side plat- form (DSP) automatically provides bid price usually relying on the ad impression value estimation and the optimal bid price determination. However, the current bid strategy over- looks the randomness of the user behaviors (e.g., click) and the cost uncertainty caused by the auction competition. In this work, we propose a novel adaptive risk-aware bidding algorithm with budget constraint via reinforcement learn- ing, which is the rst to simultaneously consider estimation uncertainty and the dynamic risk tendency of a DSP. Specif- ically, we explicitly factor in the uncertainty of estimated ad impression values and model the risk preference of a DSP under a speci c state and market environment via a sequen- tial decision process. Additionally, we theoretically unveil the intrinsic relation between the uncertainty and the risk tendency based on value at risk (VaR). Consequently, we propose two instantiations to model risk tendency, includ- ing an expert knowledge-based formulation embracing three essential properties and an adaptive learning method based on self-supervised reinforcement learning. We conduct ex- periments on public datasets and show that the proposed framework achieves better performance in terms of the num- ber of clicks under di erent budget constraints 1.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call