Data streams, as an important pattern of big data, require online real-time processing because instances arrive one by one and are fleeting. Existing online learning methods make distinctive assumptions, such as a fixed feature space, a varying feature space that follows specific patterns, and a fixed data distribution. However, data streams generated from real-world scenarios typically have both randomly changing feature spaces and data distributions, making existing methods inappropriate for practical applications. To fill this gap, this study proposes a novel Online Learning for Data Streams with Bi-dynamic Distributions (OLBD) algorithm. OLBD has a two-fold main idea: 1) it overcomes random changes in the feature space by building a mapping matrix to space transform and projects the original instances onto the global feature space; 2) it handles dynamic data distributions by constraining prior knowledge and transferring established mapping relationships to new distributions. To evaluate OLBD, we compared it with related state-of-the-art algorithms. First, we use 13 datasets to simulate three scenarios of dynamic feature space, namely trapezoidal, feature evolvable, and capricious data streams. Second, we simulated the data streams with dynamic data distributions using eight real and four generated datasets. We then conducted ablation studies on the parameter α. Finally, we analyzed data streams with bi-dynamic data distributions under different feature missing ratios and verified the generalization. The results show that OLBD significantly outperforms its rivals. Additionally, a practical case study on movie review classification was conducted to illustrate the effectiveness of OLBD in real-world scenarios.
Read full abstract