Abstract

Stochastic simulation is typically deployed for offline system design and control; however, the time delay in executing simulation hinders its application in making online decisions. With the rapid growth of computing power, simulation-based online optimization has emerged as an attractive research topic. We consider a problem of ranking and selection via simulation in the context of online decision-making, in which there exists a short time (referred to as online budget) after observing online scenarios. The goal is to select the best alternative conditional on each scenario. We propose a Unified Offline and Online Learning (UOOL) paradigm that exploits offline simulation, online scenarios, and online simulation budget simultaneously. Specifically, we model the mean performance of each alternative as a function of scenarios and learn a predictive model based on offline data. Then, we develop a sequential sampling procedure to generate online simulation data. The predictive model is updated based on offline and online data. Our theoretical result shows that online budget should be allocated to the revealed online scenario. Numerical experiments are conducted to demonstrate the superior performance of the UOOL paradigm and the benefits of offline and online simulation.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call