Abstract

The paper proposes an incremental Gene Expression Programming classifier. Its main features include using two-level ensemble consisting of base classifiers in form of genes and the upper-level classifier in the form of metagene. The approach enables us to deal with big datasets through controlling computation time using data reduction mechanisms. The user can control the number of attributes used to induce base classifiers as well as the number of base classifiers used to induce metagenes. To optimize the parameter setting phase, an approach based on the Orthogonal Experiment Design principles is proposed, allowing for statistical evaluation of the influence of different factors on the classifier performance. In addition, the algorithm is equipped with a simple mechanism for drift detection. A detailed description of the algorithm is followed by the extensive computational experiment. Its results validate the approach. Computational experiment results show that the proposed approach compares favourably with several state-of-the-art incremental classifiers.

Highlights

  • Learning from the environment through data mining remains an important research challenge

  • Incremental learners can deal with data streams and with big datasets stored in databases for which using the “oneby-one” or “chunk-by-chunk” approach could be more effective than using the traditional “batch” learners, even if no concept drift has been detected

  • We propose a new version of the incremental classifier based on Gene Expression Programming (GEP) with data reduction and a metagene as the final, upper-level, classifier

Read more

Summary

Introduction

Learning from the environment through data mining remains an important research challenge. An important part of these efforts focuses on mining big datasets and data streams. One of the most effective approaches to mining big datasets and data streams is using online or incremental learners. Incremental learners can deal with data streams and with big datasets stored in databases for which using the “oneby-one” or “chunk-by-chunk” approach could be more effective than using the traditional “batch” learners, even if no concept drift has been detected. Our approach uses GEPinduced expression trees to construct learners with the ability to deal with large datasets environment and with a concept drift phenomenon.

Related Work
The Proposed Incremental GEP-Based Classifier
Computational Experiment Results
Conclusions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call