Tensor decision trees for continual learning from drifting data streams

Bartosz Krawczyk

doi:10.1007/s10994-021-06054-y

Abstract

Data stream classification is one of the most vital areas of contemporary machine learning, as many real-life problems generate data continuously and in large volumes. However, most of research in this area focuses on vector-based representations, which are unsuitable for capturing properties of more complex multi-dimensional structures, such as images and video sequences. In this paper, we propose a novel methodology for learning adaptive decision trees from data streams of tensors. We introduce Chordal Kernel Decision Tree for continual learning from tensor data streams. In order to maintain the tensor characteristics, we propose to train and update classifiers in the kernel space designed to work with tensor representation. We use chordal distance to compute similarities between tensors and then apply it as a new feature space in which decision trees are trained. This allows for a direct decision tree induction on tensors. In order to accommodate the streaming and drifting nature of data, we propose a concept drift detection scheme based on tensor representation. It allows us to reconstruct the kernel feature space every time when change is detected. The proposed approach allows for fast and efficient induction of decision trees on streaming data with tensor representation. Experimental study, conducted on 4 real-world and 52 artificial large-scale tensor data streams, shows that using the native tensor feature space leads to more accurate classification than outperforms the vectorized representations.

Full Text