A Two-Stage Big Data Analytics Framework with Real World Applications Using Spark Machine Learning and Long Short-Term Memory Network

Muhammad Ashfaq Khan,Yangwoo Kim,Md Rezaul Karim

doi:10.3390/sym10100485

Abstract

Every day we experience unprecedented data growth from numerous sources, which contribute to big data in terms of volume, velocity, and variability. These datasets again impose great challenges to analytics framework and computational resources, making the overall analysis difficult for extracting meaningful information in a timely manner. Thus, to harness these kinds of challenges, developing an efficient big data analytics framework is an important research topic. Consequently, to address these challenges by exploiting non-linear relationships from very large and high-dimensional datasets, machine learning (ML) and deep learning (DL) algorithms are being used in analytics frameworks. Apache Spark has been in use as the fastest big data processing arsenal, which helps to solve iterative ML tasks, using distributed ML library called Spark MLlib. Considering real-world research problems, DL architectures such as Long Short-Term Memory (LSTM) is an effective approach to overcoming practical issues such as reduced accuracy, long-term sequence dependency, and vanishing and exploding gradient in conventional deep architectures. In this paper, we propose an efficient analytics framework, which is technically a progressive machine learning technique merged with Spark-based linear models, Multilayer Perceptron (MLP) and LSTM, using a two-stage cascade structure in order to enhance the predictive accuracy. Our proposed architecture enables us to organize big data analytics in a scalable and efficient way. To show the effectiveness of our framework, we applied the cascading structure to two different real-life datasets to solve a multiclass and a binary classification problem, respectively. Experimental results show that our analytical framework outperforms state-of-the-art approaches with a high-level of classification accuracy.

Highlights

Every day we experience unprecedented data growth from numerous sources, which contribute to big data in terms of volume, velocity, and variability
We provide an overall perspective by evaluating several significant concepts and previous work in the big data, Spark, machine learning (ML), deep learning (DL) and cascade learning (CL) domains
The literature review related to this article is discussed in four categories, i.e., related work to Spark, Machine, DL with Multilayer Perceptron (MLP), Long Short-Term Memory (LSTM) and CL

Summary

Introduction

Every day we experience unprecedented data growth from numerous sources, which contribute to big data in terms of volume, velocity, and variability. Big data infrastructure has been developed to identify analytics to utilize quick, reliable, and versatile computational design, providing efficient quality attributes including flexibility, accessibility, and resource pooling with on-demand and ease-of-use [3,4]. This steadily developing requirement plays a vital role in improving massive industry data analytics frameworks. This article analyzes a more proficient and massive data processing framework, Apache Spark, a new big data processing tool for distributed computing, well-suited to iterative machine learning (ML).

Background and Related Work

Proposed

Overview of the Architecture

Support

Computation Time

Continuous Learning Improvement

Proposed Framework Implementation

Description of the Dataset

Cardiac Arrhythmia Classification

Recurrent

Identifying Malicious URLs

Experimental Setup

Stage 1 Classification Analysis

Stage 2 Classification Analysis

Method

Conclusions and Outlook

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Symmetry	Publication Date: Oct 11, 2018
Citations: 40	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

A Two-Stage Big Data Analytics Framework with Real World Applications Using Spark Machine Learning and Long Short-Term Memory Network

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Symmetry

Lead the way for us

Similar Papers

Big data analytics on Apache Spark
Salman Salloum ... Xiaojun Chen
International journal of data science and analytics | VOL. 1
Salman Salloum, et. al.Salman Salloum ... Xiaojun Chen
13 Oct 2016
International journal of data science and analytics | VOL. 1

Network security and anomaly detection with Big-DAMA, a big data analytics framework
Pedro Casas ... Francesca Soro
-
Pedro Casas, et. al.Pedro Casas ... Francesca Soro
01 Sep 2017
01 Sep 2017

Analyzing SQL payloads using logistic regression in a big data environment
Omar Salah F Shareef ... Rehab Flaih Hasan
Journal of Intelligent Systems | VOL. 32
Omar Salah F Shareef, et. al.Omar Salah F Shareef ... Rehab Flaih Hasan
05 Sep 2023
Journal of Intelligent Systems | VOL. 32

Analysis of Long Short Term Memory (LSTM) Networks in the Stateful and Stateless Mode for COVID-19 Impact Prediction
Vinayak Ashok Bharadi ... Sujata S Alegavi
-
Vinayak Ashok Bharadi, et. al.Vinayak Ashok Bharadi ... Sujata S Alegavi
01 Jan 2020
01 Jan 2020

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A Two-Stage Big Data Analytics Framework with Real World Applications Using Spark Machine Learning and Long Short-Term Memory Network

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Symmetry