Abstract

The technique of Dynamic Programming for Armed Bandits is employed for solving the problem of maximizing the randomly depreciated gains of a store with unknown (finite random) number of clients with fixed (finite) number of sellers which skills are also random and will be represented as probability distributions which are themselves random. Hence, Armed Bandits’s framework will be considered with horizon being a random variable with a finite support, that far as the authors know, it has not yet been discussed. In addition, numerical examples are detailed in order to illustrate the versatility and practical implementation of the approach presented in this paper in two general contexts, given by the number of available products: one product only, such situation coincides with that in which the number of sales needs to be maximized. And, more than one product, in this case, the amount of sales is not necessarily ruled by a Bernoulli distribution.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call