Abstract

This paper first integrates big data tools—Hive, Impala, and SparkSQL—which support SQL-like queries for rapid data retrieval in big data. The three introduced tools are not only suitable for operating in business intelligence to serve high-performance data retrieval, but they are also an open-source software solution with low cost for small-to-medium enterprise use. In practice, the proposed approach provides an in-memory cache and an in-disk cache to achieve a very fast response to a query if a cache hit occurs. Moreover, this paper develops so-called platform selection that is able to select the appropriate tool dealing with input query with effectiveness and efficiency. As a result, the speed of job execution of proposed approach using platform selection is 2.63 times faster than Hive in the Case 1 experiment, and 4.57 times faster in the Case 2 experiment.

Highlights

  • The storage cost and data acquisition cost have fallen sharply due to technological advances, which has created the rise of big data in this era

  • Hive could still complete the job in the status that lacked memory, but Impala and SparkSQL crashed in reaction to some scales of data size

  • This paper has achieved automatically detecting the status of a cluster through checking the remaining memory size at nodes and choosing the appropriate tool to deal with an SQL command for a fast query response

Read more

Summary

Introduction

The storage cost and data acquisition cost have fallen sharply due to technological advances, which has created the rise of big data in this era. The growth of data volume in various industries around the world is rising rapidly, for example, in social networks [1]. Many issues are emerging in big data manipulation such as network traffic flow, high volume storage, remote backup, resources management, computing performance, package compatibility, data security, service level agreement, disaster recovery, and other abilities. Many practical applications concerning big data have to find the proper way for dealing with huge amounts of data containing a complex structure. Traditional tools cannot solve the problem as mentioned above and it is worthwhile to continue to develop emerging tools or tools to tackle the issues in big data environment

Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call