Abstract

Big data means large amount of data requires new technologies for its faster processing. It is ineffective to process the large amount of data with traditional devices. Big data provides an extra advantage in business and better service delivery. Big data brings a new change in decision making process of various business organizations. Big data has many challenges related to the 5Vs-Volume, Velocity, Variety, Veracity and Value. Hadoop is a Big Data tool used to process larger amounts of Data. It has many subcomponents work together to achieve the goal of faster processing. Apache Hive and Apache Pig are tools used to access data in different ways in Hadoop Ecosystem. Apache Hive depends upon SQL like queries while Apache Pig uses scripts. These two tools uses MapReduce or Apache Tez framework to access data. In this paper we analyze how these two frameworks uses Hadoop Distributed File System (HDFS) by comparing them in both theoretical and empirical way.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.