Big Data is an emerging technology with enormous potential to develop business and its administration. Due to the enormous volume, efficient data mining and clustering methods are crucial to extracting meaningful insights and patterns from large-scale datasets. Problems may arise from the need to analyze, capture, share, store, and visualize the data. Several methods have already been proposed for mining knowledge from big data. It is practically inefficient or impossible to handle these massive data using the proposed methods in a single machine because big data are frequently acquired from dispersed locations and stored on several machines. Matrix decomposition is one of the critical strategies to retrieve knowledge from diverse, noisy, huge data generated by modern applications and stored in dispersed locations. This study proposes a novel approach called the Rank-Revealing QR Matrix and Schur Decomposition Method (RRQR-SDM) specifically designed for big data mining and clustering tasks. The RRQR-SDM is designed to reveal the rank of the data matrix in a computationally efficient manner by using a modified QR decomposition, eliminating the need for expensive Singular Value Decomposition (SVD) computations. The proposed RRQR-SDM method offers several advantages over existing approaches. Firstly, exploiting the inherent low-rank structure reduces the computational complexity associated with large-scale datasets. By revealing the rank of the input matrix, it enables dimensionality reduction and efficient data compression. Secondly, the Schur decomposition enhances the interpretability of the data by providing a clear separation between the relevant and irrelevant components. This feature makes the RRQR-SDM method particularly suitable for data mining and clustering tasks where identifying the most significant features is essential. To evaluate the performance of the RRQR-SDM method, extensive experiments were conducted on various big data datasets. The results demonstrate that the proposed method outperforms state-of-the-art computational efficiency and clustering accuracy techniques.
Read full abstract