A note on bias optimality in controlled queueing systems

Mark E Lewis,Martin L Puterman

doi:10.1239/jap/1014842288

Abstract

The use of bias optimality to distinguish among gain optimal policies was recently studied by Haviv and Puterman [1] and extended in Lewis et al. [2]. In [1], upon arrival to an M/M/1 queue, customers offer the gatekeeper a reward R. If accepted, the gatekeeper immediately receives the reward, but is charged a holding cost, c(s), depending on the number of customers in the system. The gatekeeper, whose objective is to ‘maximize’ rewards, must decide whether to admit the customer. If the customer is accepted, the customer joins the queue and awaits service. Haviv and Puterman [1] showed there can be only two Markovian, stationary, deterministic gain optimal policies and that only the policy which uses the larger control limit is bias optimal. This showed the usefulness of bias optimality to distinguish between gain optimal policies. In the same paper, they conjectured that if the gatekeeper receives the reward upon completion of a job instead of upon entry, the bias optimal policy will be the lower control limit. This note confirms that conjecture.

Full Text