Abstract

One of the main characteristics of Data Streams is the arrival of examples on a continuous way, which can change over time. An important feature of these algorithms is novelty detection, which makes it possible to identify new classes, which may arise, even if their true labels are not yet available and update the decision model accordingly. Some approaches await the arrival of the true labels of the examples already classified to use this information in updating and maintaining the decision model. This time is called latency. Most classification algorithms, though, assume that the true labels are available immediately after the example is classified. Other methods in the literature deal with the existence of infinite latency and update their decision models in an unsupervised manner. In this work, we focus on a scenario rarely addressed in the literature, where the real labels will be available after a certain time, called intermediate latency. In addition, aiming at a more flexible learning to deal with the imprecision inherent to the data streams, we use fuzzy set theory concepts. Therefore, we propose a method called Enhanced Fuzzy Classifier with Multi-class Novelty Detection for Data Streams (EFuzzCND) that uses fuzzy clustering to perform classification and novelty detection in data streams. We compared our approach with one that is well known in the literature that uses hard clustering in order to show the main advantages and disadvantages of using fuzzy clustering in data streams classification.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call