Allocating capacity to private cloud computing services is challenging because demand is time-varying, there are often no buffers, and customers can re-submit jobs a finite number of times. We model this setting using a multi-station queueing network where servers represent CPU cores and jobs not immediately processed retry several times. Assuming retrial rates are stationary and that there is a maximum number of retrial attempts, we determine an optimal service capacity and retrial interval under an admission control policy employed by our partner institution — the server informs customers when they should next attempt service without enforcement. We introduce a recursive representation of the offered load which approximates the fluid dynamics of the system. We then use this representation to develop a solution technique that minimizes the total variation in the constructed offered load. We prove this approach is linked to maximizing system throughput and that in certain settings, the optimal stationary and time-varying retrial intervals are equivalent. Utilizing a data set of cloud computing requests spanning a 24-hour period, our analysis indicates that the optimal policy prescribes a 10% reduction in capacity. We also investigate the fidelity of the fluid model and the sensitivity of our recommendations to the behavior of retrial jobs. We find that retrial-time announcements allow a provider to satisfy service level agreements while encouraging retrial jobs to be processed during off-peak periods. Further, the policy is suitably robust to a customer’s willingness to comply with the suggested retrial times.
Read full abstract