Abstract
Google’s MapReduce has emerged as a popular framework for data-intensive computing. It is well-known by its elastic scalability and fine-grained fault tolerance. On the other hand, there are some debates in its efficiency. Especially, local and network I/Os can be a primary factor that degrades the performance of MapReduce, because it follows a data shipping paradigm where many partitioned data blocks move along distributed nodes. In this paper, we conduct a performance study to examine the I/O cost of MapReduce. Our results show that the I/O cost accounts for about 80% of the total processing cost when processing OLAP queries in the MapReduce platform.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.