Abstract

We consider the one-armed bandit problem, i.e., the two-armed bandit problem with known probability distribution of incomes corresponding to applying the first action. On the known control horizon one has to determine the most profitable action and to ensure its preferential use. We consider the problem in a batch processing application and, hence, distributions of incomes are Gaussian. In Bayesian setting, we obtain a recursive integro-difference equation for computing Bayesian strategy and Bayesian risk with respect to an arbitrary prior distribution. Then we obtain a recursive equation in invariant form with control horizon one and a second order partial differential equation in the limiting case. In minimax setting of the problem, we determine minimax strategy and minimax risk as Bayesian ones corresponding to the worst-case prior distribution. The redundancy that occurs in the problem manifests itself by the fact that instead of sequential one-by-one data processing one can use batch processing with virtually no increase in minimax risk. For example, numerical results show that processing data in 50 batches increases minimax risk by only 3%. If data can be processed in parallel then the total processing time is determined by the number of batches rather than by the total number of data.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.