The task of feature ranking has received considerable attention across various predictive modelling tasks in the batch learning scenario, but not in the online learning setting. Available methods that estimate feature importances on data streams have so far predominantly focused on ranking the features for the tasks of classification and occasionally multi-label classification. We propose a novel online feature ranking method for online multi-target regression iSOUP-SymRF, which estimates feature importance scores based on the positions at which a feature appears in the trees of a random forest of iSOUP-Trees, and additionally extend it to task of online feature ranking for multi-label classification. By utilizing iSOUP-Trees, which can address multiple structured output prediction tasks on data streams, iSOUP-SymRF promises feature ranking across a variety of online structured output prediction tasks. We examine the ranking convergence of iSOUP-SymRF in terms of the methods’ parameters, the size of the ensemble and the number of selected features, as well as their stability under different random seeds. Furthermore, to show the utility of iSOUP-SymRF and its rankings we use them in conjunction with two state-of-the-art online multi-target regression and multi-label classification methods, iSOUP-Tree and AMRules, and analyze the impact of adding features according to the rankings obtained from iSOUP-SymRF.
Read full abstract