Learning from Multiple Related Data Streams with Asynchronous Flowing Speeds

Zhi Qiao,Peng Zhang,Jing He,Li Guo,Jinghua Yan

doi:10.1109/icmla.2010.47

Abstract

Related data streams refer to data streams that can be joined together by matching their join attributes. Existing research on learning from related data streams is based on an assumption that all streams arrive at a central processing unit in a synchronous way, such that in an arbitrary sliding window, all tuples of the streams can be perfectly joined together. This assumption, however, does not hold when related data streams are generated or transferred at different speeds, and thus may arrive in the central processing unit in an asynchronous manner. In this paper, we argue that for asynchronous data streams, there exist a small portion of perfectly joined examples (i.e., complete examples) and a large portion of partially joined examples (i.e., incomplete examples). Accordingly, we present a new Learning from Complete and Fixed Examples (LCFE) framework that can fix incomplete examples to boost the learning. Experiments on both synthetic and real-world data streams demonstrate that LCFE is able to achieve a higher prediction accuracy for learning from related data streams than other simple solutions can offer.

Full Text