A Novel Approach to Select High-Reward Data Items in Big Data Stream Based on Multiarmed Bandit

Shun Wang,Guosun Zeng

doi:10.1109/tcss.2021.3114352

Abstract

Mining a big data stream with continuous, unbounded, and time-varying data items is a great challenge. In the situation where computational resources for real-time processing are limited, it is especially hard to select high-value data items from a big data stream. This article studies on selection policies based on the multiarmed bandit. We cache the online arriving data items in different buffers according to the characteristics of data items. These buffers are regarded as the arms of a multiarmed bandit. We pay attention to several key factors in selecting data items including the data item value, processing time, resource consumption, and loss value caused by some discarded items. Thus, a comprehensive reward mechanism for each data item is given as the foundation for the selection of data items, that is, gambling decision-making. We design three selection policies: the improved <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$\varepsilon $ </tex-math></inline-formula> -greedy, the improved upper confidence bound (UCB), and a data item selection policy named dynamic high-reward incentive (DHRI) with active, dynamic, and incentive reward. They are all trying to balance “exploitation and exploration” in a multiarmed bandit. Experimental results show that our proposed approach is effective and outperforms the traditional methods.

Full Text