Abstract

Clustering Big Data, as a fundamental component in the processing and analysis of massive datasets, holds crucial importance in addressing complex challenges inherent in handling extensive data sets. Falling within the realm of unsupervised learning methods, the primary objective of clustering is to efficiently organize substantial datasets into homogeneous clusters without relying on pre-existing labels. Our innovative approach seeks to optimize this process by synergistically combining three techniques: the fuzzy C-Means (FCM) methodology, the optimized encoder–decoder CNN model, and the bidirectional recurrent neural network (BiLSTM). This synergy represents a strategic convergence between supervised and unsupervised paradigms. The introduction of BiLSTM is of significant importance, leveraging its capability to sequentially process data from both sides using LSTM cells. This bidirectional approach enhances the understanding of data sequences, a crucial feature in the demanding context of Big Data clustering. Simultaneously, FCM benefits from substantial improvement through the introduction of a function that calculates the separation between the cluster center and the instance, thereby reinforcing the precision of clustering. To optimize performance and reduce computation time, our methodology advocates for the use of the Optimized Encoder–Decoder CNN model. This refined architecture promotes more efficient extraction of data features, thereby enhancing the intrinsic quality of clustering. The rigorous evaluation of our approach revolves around specific data sources, namely fashion MNIST. Performance criteria such as accuracy, adjusted rand index (ARI), and normalized mutual information (NMI) convincingly attest to the remarkable capability of our methodology. In comparative analyses, our approach significantly outperforms existing models, demonstrating its effectiveness and relevance in the complex domain of Big Data clustering.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.