Performance Overhead on Relational Join in Hadoop using Hive/Pig/Streaming - A Comparative Analysis

Prabin R.Sahoo

doi:10.5120/ijais12-450799

Abstract

Hadoop Distributed File System (HDFS) is quite popular in the big data world. It not only provides a framework for storing data in a distributed environment, but also has set of tools to retrieve and process these data using map-reduce concept. This paper discusses the result of evaluation of major tools such as Hive, Pigand hadoop streaming for solving problems from a relational prospective and comparing their performances. Though big data cannot be compared to the strength of relational database in solving relational problems, but as big data is about data so the relational nature of data access cannot be eliminated altogether. Fortunately, there are ways to deal with this which has been discussed in this paper from a performance prospective. This may help the big data community in understanding the performance challenges so that further optimization can be done and the application developers’ community can learn how strategically the relational operations need to be used.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Performance Overhead on Relational Join in Hadoop using Hive/Pig/Streaming - A Comparative Analysis

Abstract

Talk to us

Similar Papers

More From: International Journal of Applied Information Systems

Lead the way for us

Journal: International Journal of Applied Information Systems	Publication Date: Dec 15, 2012
Citations: 2

Similar Papers

A Plugin-Based Approach to Exploit RDMA Benefits for Apache and Enterprise HDFS
Adithya Bhat ... Xiaoyi Lu
-
Adithya Bhat, et. al.Adithya Bhat ... Xiaoyi Lu
01 Jan 2015
01 Jan 2015

ERP: An enhanced read policy for HDFS to improve read performance for files under construction
Junjie He ... Fei Hu
-
Junjie He, et. al. Junjie He ... Fei Hu
01 Dec 2015
01 Dec 2015

Accelerating I/O Performance of Big Data Analytics on HPC Clusters through RDMA-Based Key-Value Store
Nusrat Sharmin Islam ... Xiaoyi Lu
-
Nusrat Sharmin Islam, et. al.Nusrat Sharmin Islam ... Xiaoyi Lu
01 Sep 2015
01 Sep 2015

Major Challenges with Hadoop Distributed Framework: An Overview
Neelam Sobha Rani ... Neelam Venugopal Muthu Lakshmi
SSRN Electronic Journal | VOL. -
Neelam Sobha Rani, et. al.Neelam Sobha Rani ... Neelam Venugopal Muthu Lakshmi
07 Feb 2018
SSRN Electronic Journal | VOL. -

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Performance Overhead on Relational Join in Hadoop using Hive/Pig/Streaming - A Comparative Analysis

Abstract

Talk to us

Similar Papers

More From: International Journal of Applied Information Systems