Abstract

Data Streams create new challenges for fuzzy clustering algorithms, specifically Interval Type-2 Fuzzy C-Means (IT2FCM). One problem associated with IT2FCM is that it tends to be sensitive to initialization conditions and therefore, fails to return global optima. This problem has been addressed by optimizing IT2FCM using Ant Colony Optimization approach. However, IT2FCM-ACO obtain clusters for the whole dataset which is not suitable for clustering large streaming datasets that may be coming continuously and evolves with time. Thus, the clusters generated will also evolve with time. Additionally, the incoming data may not be available in memory all at once because of its size. Therefore, to encounter the challenges of a large data stream environment we propose improvising IT2FCM-ACO to generate clusters incrementally. The proposed algorithm produces clusters by determining appropriate cluster centers on a certain percentage of available datasets and then the obtained cluster centroids are combined with new incoming data points to generate another set of cluster centers. The process continues until all the data are scanned. The previous data points are released from memory which reduces time and space complexity. Thus, the proposed incremental method produces data partitions comparable to IT2FCM-ACO. The performance of the proposed method is evaluated on large real-life datasets. The results obtained from several fuzzy cluster validity index measures show the enhanced performance of the proposed method over other clustering algorithms. The proposed algorithm also improves upon the run time and produces excellent speed-ups for all datasets.

Highlights

  • Today, data streams are prevalent in almost every application in the real world

  • The spIT2FCM-ant colony optimization (ACO) does not produce full data partitions rather work on chunks of data and output final cluster centroids

  • This paper presents an improved spIT2FCM-ACO algorithm for large data streams

Read more

Summary

Introduction

Data streams are prevalent in almost every application in the real world. A data stream is defined as voluminous data coming continuously and most likely evolving over time with unknown dynamics [1]. Some examples of applications related to streaming data are fraud detection, weather monitoring, Internet of Things, and website and network monitoring [2,3,4,5]. In such complex real-world problems, uncertainty is most likely to emerge due to inadequate, incomplete, untrustworthy, vague, and inconsistent data [6]. These different kinds of information deficiencies may bring about different types of uncertainties. Definition of Type-2 Fuzzy Set e is given by type-2 membership function μ e(x, u) [31].

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call