A Fuzzy Approach For Classification and Novelty Detection in Data Streams Under Intermediate Latency

Heloisa Camargo ,Tiago Pinho Da Silva ,André Cristiani

doi:10.48448/y8wp-7j72

Abstract

Novelty detection is an important topic in data stream classification, as it is responsible for identifying the emergence of new concepts, new patterns, and outliers. It becomes necessary when the true label of an instance is not available right after its classification. The time between the classification of an instance and the arrival of its true label is called latency. This is a common scenario in data streams applications. However, most classification algorithms do not consider such a problem and assume that there will be no latency. On the other hand, a few methods in the literature cope with the existence of infinite latency and novelty detection in data streams. In this work, however, we focus on the scenario where the true labels will be available to the system after a certain time, called intermediate latency. Such a scenario is present in the stock market and weather datasets. Moreover, aiming for more flexible learning to deal with the uncertainties inherent in data streams, we consider the use of fuzzy set theory concepts. Therefore, we propose a method for classification and novelty detection in data streams called Fuzzy Classifier with Novelty Detection for data streams (FuzzCND). Our method uses an ensemble of fuzzy decision trees to perform the classification of new instances and applies the concepts of fuzzy set theory to detect possible novelties. The experiments showed that our approach is promising in dealing with the emergence of new concepts in data streams and inaccuracies in the data.

Full Text