Abstract

One of the most important challenges for machine learning community is to develop efficient classifiers which are able to cope with data streams, especially with the presence of the so-called concept drift. This phenomenon is responsible for the change of classification task characteristics, and poses a challenge for the learning model to adapt itself to the current state of the environment. So there is a strong belief that one-class classification is a promising research direction for data stream analysis--it can be used for binary classification without an access to counterexamples, decomposing a multi-class data stream, outlier detection or novel class recognition. This paper reports a novel modification of weighted one-class support vector machine, adapted to the non-stationary streaming data analysis. Our proposition can deal with the gradual concept drift, as the introduced one-class classifier model can adapt its decision boundary to new, incoming data and additionally employs a forgetting mechanism which boosts the ability of the classifier to follow the model changes. In this work, we propose several different strategies for incremental learning and forgetting, and additionally we evaluate them on the basis of several real data streams. Obtained results confirmed the usability of proposed classifier to the problem of data stream classification with the presence of concept drift. Additionally, implemented forgetting mechanism assures the limited memory consumption, because only quite new and valuable examples should be memorized.

Highlights

  • Contemporary computer systems manage and store enormous amount of data

  • The main aim of this paper is to introduce an efficient method of incremental data stream classification, i.e., we will consider the task where the concept drift is rather smooth and the classifier model will try to follow the model’s changes

  • We propose to use principles of incremental learning and forgetting to adjust shape of the decision boundary according to the concept changes in new data chunks

Read more

Summary

Introduction

Contemporary computer systems manage and store enormous amount of data. It is predicted that the volume of stored information will be doubling every two years. The marked leading companies desire to develop smart analytic tools based on machine learning approach, which can analyze such enormous amount of data. Designing such analytical tools should take into a consideration that most of data arrives continuously in the form of so-called data stream (Gama 2010). Data streams differ from the traditional static data, because they can be viewed as an infinite amount of data that arrives continuously, where memory and computational complexity play the crucial roles. Due to this mining data stream poses many new

Objectives
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call