Abstract

Resource allocation is an essential problem in the application of bike sharing systems. Demand estimation from historical data plays an important role in bike resource allocation. However, as the observed demand is always lower than the available bike supply, the historical pickup data is a supply-censored version of true user demand, which may lead to the degradation of allocation policies designed directly from historical data in actual online use. Therefore, the exploration of latent user demand is also necessary for the bike-sharing system. In this paper, we study the following problem: whether we can optimize the allocation policy with observed historical demand (exploitation) and consider exploring the latent demand (exploration) during the allocation process simultaneously. We model this problem as a censored semi-bandit problem, which aims to maximize the cumulative number of successful pickups during the multi-round allocation process when the real user demand is unknown at the beginning. We adopt a nonparametric estimator to estimate the user demand from the censored pickup feedback and propose an upper confidence bound based allocation policy to achieve a trade-off between the exploitation and exploration of user demand. The convergence property of the proposed policy is proved theoretically in this paper. Computational results of ablation experiments based on real-world data sets demonstrate the significance of considering exploring latent user demands and the proposed policy can well reduce the lost demands.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call