Abstract

This paper introduces a new and expressive algorithm for inducing descriptive rule-sets from streaming data in real-time in order to describe frequent patterns explicitly encoded in the stream. Data Stream Mining (DSM) is concerned with the automatic analysis of data streams in real-time. Rapid flows of data challenge the state-of-the art processing and communication infrastructure, hence the motivation for research and innovation into real-time algorithms that analyse data streams on-the-fly and can automatically adapt to concept drifts. To date, DSM techniques have largely focused on predictive data mining applications that aim to forecast the value of a particular target feature of unseen data instances, answering questions such as whether a credit card transaction is fraudulent or not. A real-time, expressive and descriptive Data Mining technique for streaming data has not been previously established as part of the DSM toolkit. This has motivated the work reported in this paper, which has resulted in developing and validating a Generalised Rule Induction (GRI) tool, thus producing expressive rules as explanations that can be easily understood by human analysts. The expressiveness of decision models in data streams serves the objectives of transparency, underpinning the vision of ‘explainable AI’ and yet is an area of research that has attracted less attention despite being of high practical importance. The algorithm introduced and described in this paper is termed Fast Generalised Rule Induction (FGRI). FGRI is able to induce descriptive rules incrementally for raw data from both categorical and numerical features. FGRI is able to adapt rule-sets to changes of the pattern encoded in the data stream (concept drift) on the fly as new data arrives and can thus be applied continuously in real-time. The paper also provides a theoretical, qualitative and empirical evaluation of FGRI.

Highlights

  • Introduction nal affiliationsThe advances in computing infrastructure and the emergence of applications that process a continuous flow of data records have led to the data stream phenomenon

  • This paper has presented the Fast Generalised Rule Induction (FGRI) algorithm for inducing and maintaining an expressive and descriptive rule-set from data streams

  • Algorithms that induce this type of rule-set only exist for static batch environments, but not for dynamically changing environments where the pattern encoded into the data stream may change, and the rule-set needs to be adapted on the fly

Read more

Summary

Introduction

The advances in computing infrastructure and the emergence of applications that process a continuous flow of data records have led to the data stream phenomenon. An important challenge in data mining is the learning process of large quantities of data through descriptive techniques. Descriptive techniques aim to capture the relationships between features in order to explain the correlated relationships between any sub-sets of features to express interesting insights and patterns, rather than forecasting the value of a particular feature of unseen data instances. A common approach to descriptive learning in data mining is frequent itemsets mining, which expresses many-many relationships between the items from a given number of. Association Rules learning is one of the techniques based on the concept of ‘Frequent Itemsets Mining’ to discover interesting relationships between items within itemsets

Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call