The online performance estimation framework: heterogeneous ensemble learning for data streams

Jan N Van Rijn,Bernhard Pfahringer,Geoffrey Holmes,Joaquin Vanschoren

doi:10.1007/s10994-017-5686-9

Jan N Van Rijn, Bernhard Pfahringer + Show 2 more

Open Access

https://doi.org/10.1007/s10994-017-5686-9

Copy DOI

Abstract

Ensembles of classifiers are among the best performing classifiers available in many data mining applications, including the mining of data streams. Rather than training one classifier, multiple classifiers are trained, and their predictions are combined according to a given voting schedule. An important prerequisite for ensembles to be successful is that the individual models are diverse. One way to vastly increase the diversity among the models is to build an heterogeneous ensemble, comprised of fundamentally different model types. However, most ensembles developed specifically for the dynamic data stream setting rely on only one type of base-level classifier, most often Hoeffding Trees. We study the use of heterogeneous ensembles for data streams. We introduce the Online Performance Estimation framework, which dynamically weights the votes of individual classifiers in an ensemble. Using an internal evaluation on recent training data, it measures how well ensemble members performed on this and dynamically updates their weights. Experiments over a wide range of data streams show performance that is competitive with state of the art ensemble techniques, including Online Bagging and Leveraging Bagging, while being significantly faster. All experimental results from this work are easily reproducible and publicly available online.

Highlights

Real-time analysis of data streams is a key area of data mining research
One way to vastly improve the performance of ensembles is to build heterogeneous ensembles, consisting of models generated by different techniques, rather than homogeneous ensembles, in which all models are generated by the same technique
We ran all ensemble techniques on all data streams

Summary

Introduction

Real-time analysis of data streams is a key area of data mining research. Many real world collected data are streams where observations come in one by one, and algorithms processing these are often subject to time and memory constraints. As data streams are constantly subject to change, the most accurate classifier for a given interval of observations changes frequently, as illustrated by Fig. 1 In their seminal paper, Littlestone and Warmuth (1994) describe a strategy to weight the vote of ensemble members based on their performance on recent observations and prove certain error bounds. This work is of great theoretical value, it needs non-trivial adjustments to be applicable on practical data streams Based on this approach, we propose a way to measure the performance of ensemble members on recent observations and combine their votes. We define Online Performance Estimation, a framework that provides dynamic weighting of the votes of individual ensemble members across the stream Utilising this framework, we introduce a new ensemble technique that combines heterogeneous models.

Related work

Methods

Online performance estimation

Ensemble composition

Experimental setup

Results

Parameter effect

Grace parameter

Number of active classifiers

Conclusions

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Machine Learning	Publication Date: Dec 21, 2017
Citations: 84	License type: open-access

R Discovery Prime

R Discovery Prime

The online performance estimation framework: heterogeneous ensemble learning for data streams

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Machine Learning

Lead the way for us

Similar Papers

Having a Blast: Meta-Learning and Heterogeneous Ensembles for Data Streams
Jan N Van Rijn ... Joaquin Vanschoren
-
Jan N Van Rijn, et. al.Jan N Van Rijn ... Joaquin Vanschoren
01 Nov 2015
01 Nov 2015

One-class learning and concept summarization for data streams
Xingquan Zhu ... Philip S Yu
Knowledge and information systems | VOL. 28
Xingquan Zhu, et. al.Xingquan Zhu ... Philip S Yu
10 Aug 2010
Knowledge and information systems | VOL. 28

An overview of learning in data streams with label scarcity
Radhika V Kulkarni ... R Subhashini
-
Radhika V Kulkarni, et. al.Radhika V Kulkarni ... R Subhashini
01 Aug 2016
01 Aug 2016

Proposal of Credit Card Fraudulent Use Detection by Online-type Decision Tree Construction and Verification of Generality
Tatsuya Minegishi ... Ayahiko Niimi
International Journal for Information Security Research | VOL. 3
Tatsuya Minegishi, et. al.Tatsuya Minegishi ... Ayahiko Niimi
01 Mar 2013
International Journal for Information Security Research | VOL. 3

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

The online performance estimation framework: heterogeneous ensemble learning for data streams

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Machine Learning