Integrated High-Performance Platform for Fast Query Response in Big Data with Hive, Impala, and SparkSQL: A Performance Evaluation

Bao Rong Chang,Hsiu-Fen Tsai,Yun-Da Lee

doi:10.3390/app8091514

Bao Rong Chang, Hsiu-Fen Tsai + Show 1 more

Open Access

https://doi.org/10.3390/app8091514

Copy DOI

Abstract

This paper first integrates big data tools—Hive, Impala, and SparkSQL—which support SQL-like queries for rapid data retrieval in big data. The three introduced tools are not only suitable for operating in business intelligence to serve high-performance data retrieval, but they are also an open-source software solution with low cost for small-to-medium enterprise use. In practice, the proposed approach provides an in-memory cache and an in-disk cache to achieve a very fast response to a query if a cache hit occurs. Moreover, this paper develops so-called platform selection that is able to select the appropriate tool dealing with input query with effectiveness and efficiency. As a result, the speed of job execution of proposed approach using platform selection is 2.63 times faster than Hive in the Case 1 experiment, and 4.57 times faster in the Case 2 experiment.

Highlights

The storage cost and data acquisition cost have fallen sharply due to technological advances, which has created the rise of big data in this era
Hive could still complete the job in the status that lacked memory, but Impala and SparkSQL crashed in reaction to some scales of data size
This paper has achieved automatically detecting the status of a cluster through checking the remaining memory size at nodes and choosing the appropriate tool to deal with an SQL command for a fast query response

Summary

Introduction

The storage cost and data acquisition cost have fallen sharply due to technological advances, which has created the rise of big data in this era. The growth of data volume in various industries around the world is rising rapidly, for example, in social networks [1]. Many issues are emerging in big data manipulation such as network traffic flow, high volume storage, remote backup, resources management, computing performance, package compatibility, data security, service level agreement, disaster recovery, and other abilities. Many practical applications concerning big data have to find the proper way for dealing with huge amounts of data containing a complex structure. Traditional tools cannot solve the problem as mentioned above and it is worthwhile to continue to develop emerging tools or tools to tackle the issues in big data environment

Results

Discussion

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Applied Sciences	Publication Date: Sep 1, 2018
Citations: 2	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Integrated High-Performance Platform for Fast Query Response in Big Data with Hive, Impala, and SparkSQL: A Performance Evaluation

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Applied Sciences

Lead the way for us

Similar Papers

Resilient distributed computing platforms for big data analysis using Spark and Hadoop
Bao Rong Chang ... Chien-Feng Huang
-
Bao Rong Chang, et. al.Bao Rong Chang ... Chien-Feng Huang
01 May 2016
01 May 2016

ADAM - A Database and Information Retrieval System for Big Multimedia Collections
Ivan Giangreco ... Heiko Schuldt
-
Ivan Giangreco, et. al.Ivan Giangreco ... Heiko Schuldt
01 Jun 2014
01 Jun 2014

A scholarly divide: Social media, Big Data, and unattainable scholarship
Asta Zelenkauskaite ... Erik P Bucy
First Monday | VOL. 21
Asta Zelenkauskaite, et. al.Asta Zelenkauskaite ... Erik P Bucy
24 Apr 2016
First Monday | VOL. 21

Development of Multiple Big Data Analytics Platforms with Rapid Response
Bao Rong Chang ... Po-Hao Liao
Scientific Programming | VOL. 2017
Bao Rong Chang, et. al.Bao Rong Chang ... Po-Hao Liao
01 Jan 2017
Scientific Programming | VOL. 2017

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Integrated High-Performance Platform for Fast Query Response in Big Data with Hive, Impala, and SparkSQL: A Performance Evaluation

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Applied Sciences