Abstract

We consider the application of the mirror descent algorithm (MDA) to the one-armed bandit problem in the minimax statement as applied to data processing. This problem is also known as the game with nature, where the player's payoff function is the mathematical expectation of the total income. The player must determine the most effective method of the two available and provide that it is predominantly used. In this case, the a priori efficiency of one of the methods is known. This article proposes a modification of the MDA that allows to improve the efficiency of control through the use of additional a priori information. The proposed strategy retains the characteristic property of strategies for one-armed bandits - if a known action is applied once, it will be applied until the end of the control. Modifications for the algorithm for one-by-one processing and for its batch version are considered. Batch processing is interesting in that the total processing time is determined by the number of batches and not the original amount of data, if it is possible to provide parallel processing of data in batches. For the proposed algorithms, using the Monte-Carlo simulation, the optimal values of the tunable parameters were calculated and the minimax risk estimates were obtained.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.