Abstract

Hadoop Distributed File System (HDFS) is quite popular in the big data world. It not only provides a framework for storing data in a distributed environment, but also has set of tools to retrieve and process these data using map-reduce concept. This paper discusses the result of evaluation of major tools such as Hive, Pigand hadoop streaming for solving problems from a relational prospective and comparing their performances. Though big data cannot be compared to the strength of relational database in solving relational problems, but as big data is about data so the relational nature of data access cannot be eliminated altogether. Fortunately, there are ways to deal with this which has been discussed in this paper from a performance prospective. This may help the big data community in understanding the performance challenges so that further optimization can be done and the application developers’ community can learn how strategically the relational operations need to be used.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.