Abstract
This paper first integrates big data tools—Hive, Impala, and SparkSQL—which support SQL-like queries for rapid data retrieval in big data. The three introduced tools are not only suitable for operating in business intelligence to serve high-performance data retrieval, but they are also an open-source software solution with low cost for small-to-medium enterprise use. In practice, the proposed approach provides an in-memory cache and an in-disk cache to achieve a very fast response to a query if a cache hit occurs. Moreover, this paper develops so-called platform selection that is able to select the appropriate tool dealing with input query with effectiveness and efficiency. As a result, the speed of job execution of proposed approach using platform selection is 2.63 times faster than Hive in the Case 1 experiment, and 4.57 times faster in the Case 2 experiment.
Highlights
The storage cost and data acquisition cost have fallen sharply due to technological advances, which has created the rise of big data in this era
Hive could still complete the job in the status that lacked memory, but Impala and SparkSQL crashed in reaction to some scales of data size
This paper has achieved automatically detecting the status of a cluster through checking the remaining memory size at nodes and choosing the appropriate tool to deal with an SQL command for a fast query response
Summary
The storage cost and data acquisition cost have fallen sharply due to technological advances, which has created the rise of big data in this era. The growth of data volume in various industries around the world is rising rapidly, for example, in social networks [1]. Many issues are emerging in big data manipulation such as network traffic flow, high volume storage, remote backup, resources management, computing performance, package compatibility, data security, service level agreement, disaster recovery, and other abilities. Many practical applications concerning big data have to find the proper way for dealing with huge amounts of data containing a complex structure. Traditional tools cannot solve the problem as mentioned above and it is worthwhile to continue to develop emerging tools or tools to tackle the issues in big data environment
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.