Abstract

This paper presents a novel high speed clustering scheme for high-dimensional data stream. Data stream clustering has gained importance in different applications, for example, network monitoring, intrusion detection, and real-time sensing. High dimensional stream data is inherently more complex when used for clustering because the evolving nature of the stream data and high dimensionality make it non-trivial. In order to tackle this problem, projected subspace within the high dimensions and limited window sized data per unit of time are used for clustering purpose. We propose a High Speed and Dimensions data stream clustering scheme (HSDStream) which employs exponential mov-ing averages to reduce the size of the memory and speed up the processing of projected subspace data stream. It works in three steps: i) initialization, ii) real-time maintenance of core and outlier micro-clusters, and iii) on-demand offline generation of the final clusters. The proposed algorithm is tested against high dimensional density-based projected clustering (HDDStream) for cluster purity, memory usage, and the cluster sensitivity. Experi-mental results are obtained for corrected KDD intrusion detection dataset. These results show that HSDStream outperforms the HDDStream in all performance metrics, especially, the memory usage and the processing speed.

Highlights

  • A Novel High Dimensional and High Speed Data Streams AlgorithmWaseem Shahzad Department of Computer Science National University of Computer and Emerging Sciences, Islamabad, Pakistan

  • The exponential growth in data mining and clustering is an apparent result of the Internet penetration and the use of the network applications

  • The new condition is W (t)/N > 90%, i.e., if the data points window contains more than 90% points, no need to check PDIM because the majority of identical data points indicates some abnormal activity on the network being monitored

Read more

Summary

A Novel High Dimensional and High Speed Data Streams Algorithm

Waseem Shahzad Department of Computer Science National University of Computer and Emerging Sciences, Islamabad, Pakistan. We propose a High Speed and Dimensions data stream clustering scheme (HSDStream) which employs exponential moving averages to reduce the size of the memory and speed up the processing of projected subspace data stream. It works in three steps: i) initialization, ii) real-time maintenance of core and outlier micro-clusters, and iii) on-demand offline generation of the final clusters. Experimental results are obtained for corrected KDD intrusion detection dataset These results show that HSDStream outperforms the HDDStream in all performance metrics, especially, the memory usage and the processing speed

INTRODUCTION
RELATED WORK
PROBLEM FORMULATION
THE HSDSTREAM ALGORITHM
Initialization
Real-time Maintenance of Micro-clusters
15: Degrade all outlierTuples
Clusters Generation
DISCUSSION
EXPERIMENTAL EVALUATION
Dataset
Cluster Quality Evaluation
Sensitivity and Delay Analysis
Findings
CONCLUSION
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.