MULTI-JOIN-ORDERING QUERY OPTIMIZATION ALGORITHM FOR HIVE WAREHOUSE WITH MAPREDUCE

Ms Nisha Jain,Dr Preeti Tiwari

doi:10.55399/hssg6334

Abstract

According to the Digital Report of July, 2021, Billions of users around the world uses Mobile Phones, Internet, social media every second. This huge range of heterogeneous digital data is called Big Data, and is measured in terms of terabytes or petabytes. It is difficult to the conventional relational databases to handle these heterogeneous data for data analytics, but is still in use significantly in the growth of Big Data. To handle SQL-based structured queries, Hadoop is one of the prominent and well-suited solution that allows Big Data to be stored and processed. Hive support SQL queries on Hadoop. Hive warehouse, is the oldest SQL-engine on the top of the Hadoop framework and to store the processed data, it uses HDFS (Hadoop Distributed File System). On the Hadoop, MapReduce is an execution engine that executes SQL-based queries. In the Query Optimization, join ordering always plays a significant role because when the order of tables in joining operation is changed, execution time of the query is reduced to a greater extent. The main problem of the Hive is that it does not enhance the order of the join for an SQL-query and also does not give assurance for an optimal execution plan. Its time complexity is measured in exponential (Shan, Y., & Chen, Y., 2015).The main focus of this paper is to discover the finest join ordering solution for a Hive query optimization problem through appropriate search algorithms and to improve SQL-based Hive queries performance with MapReduce–based system.

Full Text