Abstract

Due to the various emerging technologies, an enormous amount of data, termed as Big Data, gets collected every day and can be of great use in various domains. Clustering algorithms that store the entire data into memory for analysis become unfeasible when the dataset is too large. Many clustering algorithms present in the literature deal with the analysis of huge amount of data. The paper discusses a new clustering approach called an Incremental Random Sampling with Iterative Optimization Fuzzy c-Means (IRSIO-FCM) algorithm. It is implemented on Apache Spark, a framework for Big Data processing. Sparks works really well for iterative algorithms by supporting in-memory computations, scalability, etc. IRSIO-FCM not only facilitates effective clustering of Big Data but also performs storage space optimization during clustering. To establish a fair comparison of IRSIO-FCM, we propose an incremental version of the Literal Fuzzy c-Means (LFCM) called ILFCM implemented in Apache Spark framework. The experimental results are analyzed in terms of time and space complexity, NMI, ARI, speedup, sizeup, and scaleup measures. The reported results show that IRSIO-FCM achieves a significant reduction in run-time in comparison with ILFCM.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.