Abstract
Data are continuously evolving from a huge variety of applications in huge volume and size. They are fast changing, temporally ordered and thus data mining has become a field of major interest. A mining technique such as clustering is implemented in order to process data streams and generate a set of similar objects as an individual group. Outliers generated in this process are the noisy data points that shows abnormal behavior compared to the normal data points. In order to obtain the clusters of pure quality outliers should be efficiently discovered and discarded. In this paper, a concept of pruning is applied on the stream optics algorithm along with the identification of real outliers, which reduces memory consumption and increases the speed for identifying potential clusters.
Highlights
Traditional data mining methods are not that successful in case of huge data streams, as off-line mining is not applicable
Classification of stream data is possible with algorithms such as the Hoeffding tree, the Concept Adaptive Very Fast Decision tree (CVFDT), the Very Fast Decision Tree (VFDT), and the classifier ensemble approach
PROPOSED ARCHITECTURE In this paper a modification is applied on the stream optics algorithm by applying a pruning method and setting a threshold value cut off points for data dynamically
Summary
Abstract-Data are continuously evolving from a huge variety of applications in huge volume and size. They are fast changing, temporally ordered and data mining has become a field of major interest. A mining technique such as clustering is implemented in order to process data streams and generate a set of similar objects as an individual group. Outliers generated in this process are the noisy data points that shows abnormal behavior compared to the normal data points. In order to obtain the clusters of pure quality outliers should be efficiently discovered and discarded.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have