Recently, online learning with imbalanced data streams has aroused wide concern, which reflects an uneven distribution of different classes in data streams. Existing approaches have conventionally been conducted on stationary feature space and they assume that we can obtain the entire labels of data streams in the case of supervised learning. However, in many real scenarios, e.g., the environment monitoring task, new features flood in, and old features are partially lost during the changing environment as the different lifespans of different sensors. Besides, each instance needs to be labeled by experts, resulting in expensive costs and scarcity of labels. To address the above problems, this paper proposes a novel Online Imbalance learning with unpredictable Feature evolution and Label scarcity (OIFL) algorithm. First, we utilize margin-based online active learning to selectively label valuable instances. After obtaining the labels, we handle imbalanced class distribution by optimizing F-measure and transforming F-measure optimization into a weighted surrogate loss minimization. When data streams arrive with augmented features, we combine the online passive-aggressive algorithm and structural risk minimization to update the classifier in the divided feature space. When data streams arrive with incomplete features, we leverage variance to identify the most informative features following the empirical risk minimization principle and continue to update the existing classifier as before. Finally, we obtain a sparse but reliable learner by the strategy of projecting truncation. We derive theoretical analyses of OIFL. Also, experiments on the synthetic datasets and real-world data streams to validate the effectiveness of our method.
Read full abstract