On expressiveness and uncertainty awareness in rule-based classification for data streams

Thien Le,Frederic Stahl,Mohamed Medhat Gaber,João Bártolo Gomes,Giuseppe Di Fatta

doi:10.1016/j.neucom.2017.05.081

Thien Le, Frederic Stahl + Show 3 more

Open Access

https://doi.org/10.1016/j.neucom.2017.05.081

Copy DOI

Abstract

Mining data streams is a core element of Big Data Analytics. It represents the velocity of large datasets, which is one of the four aspects of Big Data, the other three being volume, variety and veracity. As data streams in, models are constructed using data mining techniques tailored towards continuous and fast model update. The Hoeffding Inequality has been among the most successful approaches in learning theory for data streams. In this context, it is typically used to provide a statistical bound for the number of examples needed in each step of an incremental learning process. It has been applied to both classification and clustering problems. Despite the success of the Hoeffding Tree classifier and other data stream mining methods, such models fall short of explaining how their results (i.e., classifications) are reached (black boxing). The expressiveness of decision models in data streams is an area of research that has attracted less attention, despite its paramount of practical importance. In this paper, we address this issue, adopting Hoeffding Inequality as an upper bound to build decision rules which can help decision makers with informed predictions (white boxing). We termed our novel method Hoeffding Rules with respect to the use of the Hoeffding Inequality in the method, for estimating whether an induced rule from a smaller sample would be of the same quality as a rule induced from a larger sample. The new method brings in a number of novel contributions including handling uncertainty through abstaining, dealing with continuous data through Gaussian statistical modelling, and an experimentally proven fast algorithm. We conducted a thorough experimental study using benchmark datasets, showing the efficiency and expressiveness of the proposed technique when compared with the state-of-the-art.

Highlights

One problem the research area of ‘Big Data Analytics’ is concerned with is the analysis of high velocity data, known as streaming data [1, 2], that challenge our computational resources
The research presented in this paper is motivated by the fact that rulebased data stream classification models are more expressive than other models, such as decision tree models, instance based models and probabilistic models
Inducing a classifier on data streams has some unique challenges 800 compared with data mining from batch data, as the pattern encoded in the stream may change over time which is known as concept drift

Summary

Introduction

One problem the research area of ‘Big Data Analytics’ is concerned with is the analysis of high velocity data, known as streaming data [1, 2], that challenge our computational resources. As accuracy has been the dominating measure of interest in comparing classifiers in both static and streaming environments, it is evident that real-time deci sion making based on streaming models still suffers from the issue of trust [17]. To address this issue, the user is able to determine an accuracy loss band (ζ), such that the model can be expressive enough to grant trust, and at the same time the accuracy can be tolerated at (−ζ%) of any other best performing classifier which is less expressive (can be a total black box).

Related Work

Hoeffding Rules

Probability Density Distribution for Expressive Continuous Rule Terms

Using the Hoeffding Bound to Ensure Quality of Learnt Rules from a

Overall Learning Process of Hoeffding Rules

Experimental Evaluation and Discussion

Datasets

Abstaining from Classification

Conclusions

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Neurocomputing	Publication Date: Jun 3, 2017
Citations: 19	License type: cc-by

R Discovery Prime

R Discovery Prime

On expressiveness and uncertainty awareness in rule-based classification for data streams

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Neurocomputing

Lead the way for us

Similar Papers

Solving the challenges of concept drift in data stream classification.
Hanqing Hu
-
Hanqing HuHanqing Hu
14 Oct 2022
14 Oct 2022

Decision Trees for Mining Data Streams Based on the McDiarmid's Bound
Leszek Rutkowski ... Piotr Duda
IEEE Transactions on Knowledge and Data Engineering | VOL. 25
Leszek Rutkowski, et. al.Leszek Rutkowski ... Piotr Duda
01 Jun 2013
IEEE Transactions on Knowledge and Data Engineering | VOL. 25

A survey on data stream clustering and classification
Hai-Long Nguyen ... Yew-Kwong Woon
Knowledge and Information Systems | VOL. 45
Hai-Long Nguyen, et. al.Hai-Long Nguyen ... Yew-Kwong Woon
17 Dec 2014
Knowledge and Information Systems | VOL. 45

Online Mining Changes of Items over Continuous Append-only and Dynamic Data Streams
...
Zenodo (CERN European Organization for Nuclear Research) | VOL. -
, et. al. ...
01 Jan 2004
Zenodo (CERN European Organization for Nuclear Research) | VOL. -

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

On expressiveness and uncertainty awareness in rule-based classification for data streams

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Neurocomputing