Software Abstractions for Large-Scale Deep Learning Models in Big Data Analytics

Ayaz H Khan,Rehanullah Khan,Aneeq Yusuf,Ali Mustafa

doi:10.14569/ijacsa.2019.0100469

Abstract

The goal of big data analytics is to analyze datasets with a higher amount of volume, velocity, and variety for large-scale business intelligence problems. These workloads are normally processed with the distribution on massively parallel analytical systems. Deep learning is part of a broader family of machine learning methods based on learning representations of data. Deep learning plays a significant role in the information analysis by adding value to the massive amount of unsupervised data. A core domain of research is related to the development of deep learning algorithms for auto-extraction of complex data formats at a higher level of abstraction using the massive volumes of data. In this paper, we present the latest research trends in the development of parallel algorithms, optimization techniques, tools and libraries related to big data analytics and deep learning on various parallel architectures. The basic building blocks for deep learning such as Restricted Boltzmann Machines (RBM) and Deep Belief Networks (DBN) are identified and analyzed for parallelization of deep learning models. We proposed a parallel software API based on PyTorch, Hadoop Distributed File System (HDFS), Apache Hadoop MapReduce and MapReduce Job (MRJob) for developing large-scale deep learning models. We obtained about 5-30% reduction in the execution time of the deep auto-encoder model even on a single node Hadoop cluster. Furthermore, the complexity of code development is significantly reduced to create multi-layer deep learning models.

Highlights

Big volumes of data have been started to accumulate based on the advancements in sensor technology, the Internet, social networks, wireless communication, and inexpensive memory in various formats such as numerical, textual, and image
We explored several parallel algorithms, optimization techniques, tools and libraries related to big data analytics and deep learning on various parallel architectures
In order to utilize these frameworks and libraries for developing large-scale deep learning models for big data analytics, we need to extend these tools to execute on multiple computing nodes where each node has a portion of input samples and runs the model in parallel

Summary

INTRODUCTION

Big volumes of data have been started to accumulate based on the advancements in sensor technology, the Internet, social networks, wireless communication, and inexpensive memory in various formats such as numerical, textual, and image Such a high volume of data can be analyzed using statistical and Computational Intelligence (CI) tools based on neuro-computing, fuzzy logic, clustering, Bayesian networks, Principal Component Analysis (PCA), etc. Deep learning is an active research area both in industry and academia to solve various practical examples such as image and speech recognition, neural machine translation, traffic management, and cancer detection It has been successfully applied in task classification, object detection, motion modeling, dimensionality reduction, and network flow prediction [3].

LITERATURE REVIEW

TensorFlow

PyTorch

Caffe2

Comparison of Deep Learning Frameworks

Customize Code Optimizations of Deep Learning Algorithms

BUILDING BLOCKS FOR DEEP LEARNING IN BIG DATA ANALYTICS

PROPOSED SOFTWARE ABSTRACTIONS FOR DEEP LEARNING MODELS

API Process Flow

API Usage

GB 1 TB

API EVALUATIONS IN TERMS OF PERFORMANCE AND USAGE

Findings

CONCLUSION AND FUTURE WORK

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Software Abstractions for Large-Scale Deep Learning Models in Big Data Analytics

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: International Journal of Advanced Computer Science and Applications

Lead the way for us

Journal: International Journal of Advanced Computer Science and Applications	Publication Date: Jan 1, 2019
License type: cc-by

Similar Papers

Abstract 184: The utility of deep metric learning for breast cancer identification on mammographic images
Justin Du ... Marina Joel
Cancer Research | VOL. 81
Justin Du, et. al.Justin Du ... Marina Joel
01 Jul 2021
Cancer Research | VOL. 81

Development of Deep Learning Algorithms, Frameworks and Hardwares
Sen Yang ... Weiqi Zhang
-
Sen Yang, et. al.Sen Yang ... Weiqi Zhang
01 Jan 2021
01 Jan 2021

Deep Learning in Transportation Cyber-Physical Systems
Zadid Khan ... Sakib Mahmud Khan
-
Zadid Khan, et. al.Zadid Khan ... Sakib Mahmud Khan
07 Sep 2022
07 Sep 2022

Deep Learning for the Web
Kyomin Jung ... Byoung-Tak Zhang
-
Kyomin Jung, et. al.Kyomin Jung ... Byoung-Tak Zhang
18 May 2015
18 May 2015

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Software Abstractions for Large-Scale Deep Learning Models in Big Data Analytics

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: International Journal of Advanced Computer Science and Applications