Big Data Datasets Research Articles

Big Data is an emerging technology with enormous potential to develop business and its administration. Due to the enormous volume, efficient data mining and clustering methods are crucial to extracting meaningful insights and patterns from large-scale datasets. Problems may arise from the need to analyze, capture, share, store, and visualize the data. Several methods have already been proposed for mining knowledge from big data. It is practically inefficient or impossible to handle these massive data using the proposed methods in a single machine because big data are frequently acquired from dispersed locations and stored on several machines. Matrix decomposition is one of the critical strategies to retrieve knowledge from diverse, noisy, huge data generated by modern applications and stored in dispersed locations. This study proposes a novel approach called the Rank-Revealing QR Matrix and Schur Decomposition Method (RRQR-SDM) specifically designed for big data mining and clustering tasks. The RRQR-SDM is designed to reveal the rank of the data matrix in a computationally efficient manner by using a modified QR decomposition, eliminating the need for expensive Singular Value Decomposition (SVD) computations. The proposed RRQR-SDM method offers several advantages over existing approaches. Firstly, exploiting the inherent low-rank structure reduces the computational complexity associated with large-scale datasets. By revealing the rank of the input matrix, it enables dimensionality reduction and efficient data compression. Secondly, the Schur decomposition enhances the interpretability of the data by providing a clear separation between the relevant and irrelevant components. This feature makes the RRQR-SDM method particularly suitable for data mining and clustering tasks where identifying the most significant features is essential. To evaluate the performance of the RRQR-SDM method, extensive experiments were conducted on various big data datasets. The results demonstrate that the proposed method outperforms state-of-the-art computational efficiency and clustering accuracy techniques.

Read full abstract

A number of strategies empowered by Internet of Thing (IoT) have been utilized for the prediction of several dreadful diseases like diabetic for which ceaseless and real-time tracing system is an exceptionally predominant one. Wearable medical devices with sensor have consistently producing abundant data volume referred to as big data. With higher speed of data creation, it becomes cumbersome to accumulate, operate and analyze such abundant data volume during emergency. Deep Learning (DL) approaches have been largely made use of perceiving patterns, categorizing objects and the prediction of diabetic diseases at an early stage. Even so, DLs are fundamentally modest in evaluation precisely when the diabetic disease data size is abundant. To achieve the expectations of DLs for big data applications from an electronic device, the evaluation procedure for diabetic disease prediction must be speeded up, so that early analysis can be arrived at. In this work, a method called Equidistant Heuristic and Duplex Deep Neural Network (EH-DDNN) for early diabetic disease prediction is proposed. First, with the Big Data dataset as input, Equidistant Heuristic Pruning (EHP) algorithm is presented for feature selection. The EHP splits the input data matrix into rows and columns separately. By utilizing the notion of conditional non-alignment assessment and heuristics techniques, EHP, exploits neighbourhood evaluations into sub-division while reducing the communication time and overhead, thus enormously correlating computations. This in turn removes the irrelevant as well as redundant features, therefore resulting with fewer features easier for early prediction. Next, with the inherent features, a Duplex Deep Neural Network (DDNN) is designed for early prediction analyses a fusion of nonlinear processing features and linear response for stockpiling abundant data volume. Experiments are conducted and validation is performed on the benchmark datasets, Diabetes Data Set from UCI repository and Pima Indians Diabetes Disease dataset. Comparative analysis of diabetic disease prediction time, diabetic disease prediction overhead and ROC curve analysis are made.

Read full abstract

Big Data Datasets Research Articles

Related Topics

Articles published on Big Data Datasets

Developing Big Data anomaly dynamic and static detection algorithms: AnomalyDSD spark package

The role of Big Data and Data Science in the context of information security and cybersecurity

A novel efficient Rank-Revealing QR matrix and Schur decomposition method for big data mining and clustering (RRQR-SDM)

Explainable machine learning models for Medicare fraud detection

ChatGPT and Big Data: Enhancing Text-to-Speech Conversion

InterCriteria Analysis as a tool for analyzing Big Data datasets: Case study of 2021 national statistics of Bulgarian system of higher education

The Longitudinal IntermediaPlus (2014–2016): A Case Study in Structuring Unstructured Big Data

Prediction of drug response in major depressive disorder using ensemble of transfer learning with convolutional neural network based on EEG

An IoT based big data framework using equidistant heuristic and duplex deep neural network for diabetic disease prediction

Mining of High-Utility Patterns in Big IoT-based Databases

Cooperative co-evolution for feature selection in Big Data with random feature grouping

Adaptive Fuzzy Map Approach for Accruing Velocity of Big Data Relies on Fireflies Algorithm for Decentralized Decision Making

Hybrid neural networks for big data classification

From Big to Smart Data: Iterative ensemble filter for noise filtering in Big Data classification

DPASF: a flink library for streaming data preprocessing

D2D Task Offloading: A Dataset-Based Q&A

Evaluating associative classification algorithms for Big Data

Hybrid Parallel Linguistic Fuzzy Rules with Canopy MapReduce for Big Data Classification in Cloud

SMOTE-BD: An Exact and Scalable Oversampling Method for Imbalanced Classification in Big Data

A taxonomy of software-based and hardware-based approaches for energy efficiency management in the Hadoop

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Big Data Datasets Research Articles

Related Topics

Articles published on Big Data Datasets

Developing Big Data anomaly dynamic and static detection algorithms: AnomalyDSD spark package

The role of Big Data and Data Science in the context of information security and cybersecurity

A novel efficient Rank-Revealing QR matrix and Schur decomposition method for big data mining and clustering (RRQR-SDM)

Explainable machine learning models for Medicare fraud detection

ChatGPT and Big Data: Enhancing Text-to-Speech Conversion

InterCriteria Analysis as a tool for analyzing Big Data datasets: Case study of 2021 national statistics of Bulgarian system of higher education

The Longitudinal IntermediaPlus (2014–2016): A Case Study in Structuring Unstructured Big Data

Prediction of drug response in major depressive disorder using ensemble of transfer learning with convolutional neural network based on EEG

An IoT based big data framework using equidistant heuristic and duplex deep neural network for diabetic disease prediction

Mining of High-Utility Patterns in Big IoT-based Databases

Cooperative co-evolution for feature selection in Big Data with random feature grouping

Adaptive Fuzzy Map Approach for Accruing Velocity of Big Data Relies on Fireflies Algorithm for Decentralized Decision Making

Hybrid neural networks for big data classification

From Big to Smart Data: Iterative ensemble filter for noise filtering in Big Data classification

DPASF: a flink library for streaming data preprocessing

D2D Task Offloading: A Dataset-Based Q&amp;A

Evaluating associative classification algorithms for Big Data

Hybrid Parallel Linguistic Fuzzy Rules with Canopy MapReduce for Big Data Classification in Cloud

SMOTE-BD: An Exact and Scalable Oversampling Method for Imbalanced Classification in Big Data

A taxonomy of software-based and hardware-based approaches for energy efficiency management in the Hadoop

D2D Task Offloading: A Dataset-Based Q&A