An Imbalanced Big Data Mining Framework for Improving Optimization Algorithms Performance

Eslam Mohsen Hassib,Sally M El-Ghamrawy,El-Sayed M El-Kenawy,Ali Ibrahim El-Desouky

doi:10.1109/access.2019.2955983

Eslam Mohsen Hassib, Sally M El-Ghamrawy + Show 2 more

Open Access

https://doi.org/10.1109/access.2019.2955983

Copy DOI

Abstract

Big data is an important factor almost in all nowadays technologies, such as, social media, smart cities, and internet of things. Most of standard classifiers tends to be trapped in local optima problem when dealing with such massive datasets. Hence, investigating new techniques for dealing with such massive data sets is required. This paper presents a novel imbalanced big data mining framework for improving optimization algorithms performance by eliminating the local optima problem consists of three main stages. Firstly, the preprocessing stage, which uses the LSH-SMOTE algorithm for solving the class imbalance problem, then it uses the LSH algorithm for hashing the data set instances into buckets. Secondly, the bucket search stage, which uses the GWO for training bidirectional recurrent neural network BRNN and searching for the global optimum in each bucket. Lastly, the final tournament winner stage, which uses the GWO+BRNN for finding the global optimum of the whole data set among all global optimums from all buckets. Our proposed framework LSHGWOBRNN has been tested over 9 data sets one of them is big data set in terms of AUC, MSE, against seven well-known machine-learning algorithms (Naive Bayes, Random Tree, Decision Table, and AdaBoostM1, WOA+MLP, GWO+MLP, and WOA+BRNN), then, we tested our algorithm over four well-known data sets against GWO+MLP, ACO+MLP, GA+MLP, PSO+MLP, PBIL+MLP, and ES+MLP in terms of classification accuracy and MSE. Our experimental results have proved that our proposed framework LSHGWOBRNN has provided high local optima avoidance, and higher accuracy, less complexity and overhead.

Highlights

The rapid growth of smart devices, internet of things, smart cities and massive number of sensors networks are leading the world to be flooded by a gigantic amount of data generated from numerous sources, such as social networks, sensor networks data, video broadcasting sites, bioinformatics, internet marketing and more
Our proposed framework LSHGWOBRNN will be tested against seven classifiers (Naive Bayes, AdaBoostM1, Decision Table, and Random Tree), in addition to Grey wolf Optimizer (GWO)+Multilayer perceptron (MLP), which is published in 2015 [62], WOA+MLP, and WOA+Bidirectional Recurrent Neural Networks (BRNN) [71], will be performed over eight highly imbalanced data sets obtained from the KEEL Data Set Repository (Imbalance ratio higher than 9) [63], and one big dataset that has been used in ECBDL 14 Big Data Mining Competition 2014 [64]
FIRST EXPERIMENT In this experiment, our proposed framework LSHGWOBRNN will be tested against seven classifiers [71] over nine highly imbalanced data sets, over two sub experiments, without preprocessing, and with Locality Sensitive Hashing (LSH)-SMOTE preprocessing in terms of area under the ROC curve (AUC) and Mean Square Error (MSE) (Local Optima Avoidance)

Summary

Introduction

The rapid growth of smart devices, internet of things, smart cities and massive number of sensors networks are leading the world to be flooded by a gigantic amount of data generated from numerous sources, such as social networks, sensor networks data, video broadcasting sites, bioinformatics, internet marketing and more. Extracting knowledge from such vast data sets is considered as one of the biggest challenges for most of traditional machine learning techniques [1]. The classifier could report a very good performance on the majority class but, on the other hand, it could report a very bad performance on the minority class, since they consider a balanced data distribution

Objectives

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE Access	Publication Date: Jan 1, 2019
Citations: 46	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

An Imbalanced Big Data Mining Framework for Improving Optimization Algorithms Performance

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access

Lead the way for us

Similar Papers

WOA + BRNN: An imbalanced big data classification framework using Whale optimization and deep neural network
Eslam M Hassib ... Ali I El-Desouky
Soft Computing | VOL. 24
Eslam M Hassib, et. al.Eslam M Hassib ... Ali I El-Desouky
11 Mar 2019
Soft Computing | VOL. 24

The IoT and Big Data Analytics for Smart Sustainable Cities: Enabling Technologies and Practical Applications
Simon Elias Bibri
-
Simon Elias BibriSimon Elias Bibri
01 Jan 2020
01 Jan 2020

Big Data Optimization for Communication Networks
Zhu Han ...
-
Zhu Han, et. al.Zhu Han ...
01 Jan 2017
01 Jan 2017

A Master Data Management Solution to Unlock the Value of Big Infrastructure Data for Smart, Sustainable and Resilient City Planning
S Thomas Ng ... Mengxue Lu
Procedia Engineering | VOL. 196
S Thomas Ng, et. al.S Thomas Ng ... Mengxue Lu
01 Jan 2017
Procedia Engineering | VOL. 196

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

An Imbalanced Big Data Mining Framework for Improving Optimization Algorithms Performance

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: IEEE Access