Abstract

The work on the paper is focused on the use of Fractal Dimension in clustering for evolving data streams. Recently Anuradha et al. proposed a new approach based on Relative Change in Fractal Dimension (RCFD) and damped window model for clustering evolving data streams. Through observations on the aforementioned referred paper, this paper reveals that the formation of quality cluster is heavily predominant on the suitable selection of threshold value. In the above-mentionedpaper Anuradha et al. have used a heuristic approach for fixing the threshold value. Although the outcome of the approach is acceptable, however, the approach is purely based on random selection and has no basis to claim the acceptability in general. In this paper a novel method is proposed to optimally compute threshold value using a population based randomized approach known as particle swarm optimization (PSO). Simulations are done on two huge data sets KDD Cup 1999 data set and the Forest Covertype data set and the results of the cluster quality are compared with the fixed approach. The comparison reveals that the chosen value of threshold by Anuradha et al., is robust and can be used with confidence.

Highlights

  • Clustering is partitioning data into similar objects where each cluster can be a model for the similarity amongHow to cite this paper: Yarlagadda, A., Murthy, J.V.R. and Krishna Prasad, M.H.M. (2014) Particle Swarm Optimized Optimal Threshold Value Selection for Clustering based on Correlation Fractal Dimension

  • After applying particle swarm optimization (PSO) the optimal value of € is found to be 0.23. With this obtained value the clustering is performed on the data sets KDD Cup 1999 data set and the Forest Covertype data set based on Correlation Fractal Dimension as in [19] and the purity of the clustering is evaluated

  • In this paper a new approach for suitably choosing a minimum threshold value that helps in identifying the appropriate group or cluster in Relative Change in Fractal Dimension (RCFD) based clustering technique is investigated

Read more

Summary

Introduction

Clustering is partitioning data into similar objects where each cluster can be a model for the similarity amongHow to cite this paper: Yarlagadda, A., Murthy, J.V.R. and Krishna Prasad, M.H.M. (2014) Particle Swarm Optimized Optimal Threshold Value Selection for Clustering based on Correlation Fractal Dimension. Representing data by less number of clusters necessarily loses certain fine details as well large number of clusters may not give good results. That amount of gathered data is not useful by itself, requiring processing in order to extract relevant and useful information for further analysis by a domain specialist. These requirements motivated the researchers in the field of data stream mining [3]. Since data streams are continuous sequences of information, the underlying clusters could change with time, giving different results with respect to the time horizon over which they are computed

Methods
Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.