The idea of combining the active query strategy and the passive-aggressive (PA) update strategy in online learning can be credited to the PA active (PAA) algorithm, which has proven to be effective in learning linear classifiers from datasets with a fixed feature space. We propose a novel family of online active learning algorithms, named PAA learning for trapezoidal data streams (PAA [Formula: see text] ) and multiclass PAA [Formula: see text] (MPAA [Formula: see text] ) (and their variants), for binary and multiclass online classification tasks on trapezoidal data streams where the feature space may expand over time. Under the context of an ever-changing feature space, we provide the theoretical analysis of the mistake bounds for both PAA [Formula: see text] and MPAA [Formula: see text] . Our experiments on a wide variety of benchmark datasets have confirm that the combination of the instance-regulated active query strategy and the PA update strategy is much more effective in learning from trapezoidal data streams. We have also compared PAA [Formula: see text] with online learning with streaming features (OL [Formula: see text] )-the state-of-the-art approach in learning linear classifiers from trapezoidal data streams. PAA [Formula: see text] could achieve much better classification accuracy, especially for large-scale real-world data streams.
Read full abstract