Abstract

The technical advancements in big data have become popular and most desirable among users for storing, processing, and handling huge data sets. However, clustering using these big data sets has become a major challenge in big data analysis. The conventional clustering algorithms used scalable solutions for managing huge data sets. Thus, this study proposes a technique for big data clustering using the spark architecture. The proposed technique undergoes two steps for clustering the big data, involving feature selection and clustering, performed in the initial cluster nodes of spark architecture. At first, the initial cluster nodes read the big data from various distributed systems, and the optimal features are selected and placed in the feature vector based on the proposed moth-flame optimization-based bat (MFO-Bat) algorithm, which is designed by integrating MFO and Bat algorithms. Then, the selected features are fed to the final cluster nodes of spark, which uses the sparse-fuzzy C-means method for performing optimal clustering. The performance of proposed MFO-Bat outperformed other existing methods with a maximal classification accuracy of 95.806%, Dice coefficient of 99.181%, and Jaccard coefficient of 98.376%, respectively.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.