Clustering Techniques for Big Data Mining

Youssef Fakir,Jihane El Iklil

doi:10.1007/978-3-030-76508-8_14

Abstract

This paper introduces the Clustering method as an unsupervised machine learning where the input and the output data are unlabeled. Many algorithms are designed to solve clustering problems and many approaches were developed to enhance deficiency or to seek efficiency and effectiveness. These approaches are partitioning-based, hierarchical-based, density-based, grid-based, and model-based. With the evolution of data amounts in every second, we become faced to deal with what is called big data that compelled researchers to develop the algorithms based on these approaches in order to adjust them to manage warehouses in a fast way. Our main purpose is the comparative of representative algorithms of each approach that respect most of the big data criterions which are called the 4Vs. The comparison aims to figure out which algorithms could mine efficiently information by clustering big data. The studied algorithms are FCM, CURE, OPTICS, BANG, and EM respectively from each approach aforementioned. Assessing these algorithms based on the 4Vs big data criterions which are Volume, Variety, Velocity and Value shows some deficiency in some of them. All trained algorithms clusters well large datasets but exclusively FCM and OPTICS algorithms suffer from the curse of dimensionality. FCM and EM algorithms are very sensitive to outliers which affect badly the results. FCM, CURE, and EM algorithms require the number of clusters as input which plays a deficiency if the optimal one wasn’t chosen. FCM and EM algorithms give spherical shapes of clusters unlike CURE, OPTICS, and BANG algorithms which give arbitrary ones that play an advantage for cluster quality. FCM algorithm is the fastest in performing big data, unlike EM algorithm that takes the longest time in training. For diversity in types of data CURE algorithm trains both numerical and categorical data types. Consequently, the analysis leads us to conclude that both CURE and BANG are efficient in clustering big data but we noticed that CURE lacks a bit of accuracy in data assignment. Therefore we infer to qualify the BANG algorithm to be the appropriate one to cluster a large dataset with high dimensionality and noise within it. BANG algorithm is based on a grid structure but comprises implicitly partitioning, hierarchical and density approaches the reason behind its efficiency in giving good accurate results. But even so, the ultimate accuracy in clustering isn’t reached yet but almost close. The conclusion we observe from the BANG algorithm should be applied to more algorithms by mixing approaches in order to attain the ultimate accuracy and effectiveness that lead consequently to accurate future decisions.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Clustering Techniques for Big Data Mining

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

Student Psychology based optimized routing algorithm for big data clustering in IoT with MapReduce framework
Gowri Shanmugam ... Surendran Rajendran
Journal of Intelligent & Fuzzy Systems | VOL. 44
Gowri Shanmugam, et. al.Gowri Shanmugam ... Surendran Rajendran
30 Jan 2023
Journal of Intelligent & Fuzzy Systems | VOL. 44

Fuzzy Based Clustering of Consumers' Big Data in Industrial Applications
Akash Sharma ... Varsha Arya
-
Akash Sharma, et. al.Akash Sharma ... Varsha Arya
06 Jan 2023
06 Jan 2023

A Novel Ensemble Methodology to Validate Fuzzy Clusters of Big Data
Tanvir Habib Sardar ... Anjan Bandyopadhyay
-
Tanvir Habib Sardar, et. al.Tanvir Habib Sardar ... Anjan Bandyopadhyay
01 Jan 2023
01 Jan 2023

Sampling-based consensus fuzzy clustering on Big Data
Mohamed Ali Zoghlami ... Rahma Ben Ayed
-
Mohamed Ali Zoghlami, et. al.Mohamed Ali Zoghlami ... Rahma Ben Ayed
01 Jul 2016
01 Jul 2016

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Clustering Techniques for Big Data Mining

Abstract

Talk to us

Similar Papers