Abstract

Hadoop is one popular implementation of MapReduce programming model, which has made programming on distributed system with much ease. In computer world, the convenience is always at the cost of performance. Comparing with MPI, Hadoop simplifies the programming, but it degrades the performance. In this work, we focus on the comparison between Hadoop and Hadoop Streaming, since Hadoop Streaming is widely used as it frees programmers from Java language, which makes programmers use the power of Hadoop more easily. Also, Hadoop Streaming brings the performance penalty. With deep analysis of Hadoop Streaming mechanism, we find out that pipe is the major bottleneck. In our experiments, we evaluate the performance of Hadoop Streaming with 6 benchmarks, The experiment results show that Hadoop Streaming degrades the performance a lot only for data intensive jobs, and for computational intensive jobs, Hadoop Streaming may even performs better because of using a more effiecient language than Java.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.