Abstract

SUMMARY In this paper we define a response surface bandit as the sequential design problem that maximizes an expected bandit utility but where the outcomes yn are continuous and can be related through a response surface to a set of controllable variables xn = (x 1n, x 2n,···, xkn). We link this problem to other traditional optimization problems from industrial engineering and to the traditional bandit problem. We consider two approaches to the problem. The first is based on a myopic sequential design. The second approach uses the best design out of a family of designs related to upper bounds for the predicted surface; the family includes myopic and sequential versions of D-optimal designs. These approaches can be generalized to more broadly defined sequential problems.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call