Abstract

Mining a big data stream with continuous, unbounded, and time-varying data items is a great challenge. In the situation where computational resources for real-time processing are limited, it is especially hard to select high-value data items from a big data stream. This article studies on selection policies based on the multiarmed bandit. We cache the online arriving data items in different buffers according to the characteristics of data items. These buffers are regarded as the arms of a multiarmed bandit. We pay attention to several key factors in selecting data items including the data item value, processing time, resource consumption, and loss value caused by some discarded items. Thus, a comprehensive reward mechanism for each data item is given as the foundation for the selection of data items, that is, gambling decision-making. We design three selection policies: the improved <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$\varepsilon $ </tex-math></inline-formula> -greedy, the improved upper confidence bound (UCB), and a data item selection policy named dynamic high-reward incentive (DHRI) with active, dynamic, and incentive reward. They are all trying to balance “exploitation and exploration” in a multiarmed bandit. Experimental results show that our proposed approach is effective and outperforms the traditional methods.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call