Incremental Gene Expression Programming Classifier with Metagenes and Data Reduction

Joanna Jedrzejowicz,Piotr Jedrzejowicz

doi:10.1155/2018/6794067

Joanna Jedrzejowicz, Piotr Jedrzejowicz

Open Access

https://doi.org/10.1155/2018/6794067

Copy DOI

Journal: Complexity	Publication Date: Nov 7, 2018
Citations: 2	License type: CC BY 4.0

Affiliation: University of Gdańsk, Gdynia Maritime University

Abstract

The paper proposes an incremental Gene Expression Programming classifier. Its main features include using two-level ensemble consisting of base classifiers in form of genes and the upper-level classifier in the form of metagene. The approach enables us to deal with big datasets through controlling computation time using data reduction mechanisms. The user can control the number of attributes used to induce base classifiers as well as the number of base classifiers used to induce metagenes. To optimize the parameter setting phase, an approach based on the Orthogonal Experiment Design principles is proposed, allowing for statistical evaluation of the influence of different factors on the classifier performance. In addition, the algorithm is equipped with a simple mechanism for drift detection. A detailed description of the algorithm is followed by the extensive computational experiment. Its results validate the approach. Computational experiment results show that the proposed approach compares favourably with several state-of-the-art incremental classifiers.

Highlights

Learning from the environment through data mining remains an important research challenge
Incremental learners can deal with data streams and with big datasets stored in databases for which using the “oneby-one” or “chunk-by-chunk” approach could be more effective than using the traditional “batch” learners, even if no concept drift has been detected
We propose a new version of the incremental classifier based on Gene Expression Programming (GEP) with data reduction and a metagene as the final, upper-level, classifier

Summary

Introduction

Learning from the environment through data mining remains an important research challenge. An important part of these efforts focuses on mining big datasets and data streams. One of the most effective approaches to mining big datasets and data streams is using online or incremental learners. Incremental learners can deal with data streams and with big datasets stored in databases for which using the “oneby-one” or “chunk-by-chunk” approach could be more effective than using the traditional “batch” learners, even if no concept drift has been detected. Our approach uses GEPinduced expression trees to construct learners with the ability to deal with large datasets environment and with a concept drift phenomenon.

Related Work

The Proposed Incremental GEP-Based Classifier

Computational Experiment Results

Conclusions