Abstract

Data intensive applications such as MapReduce can have large performance degradation from the effects of I/O interference when multiple processes access the same I/O resources simultaneously, particularly in the case of disks. It is necessary to understand this effect in order to improve resource allocation and utilization for these applications. In this paper, we propose a model for predicting the impact of I/O interference on MapReduce application performance. Our model takes basic parameters of the workload and hardware environment, and knowledge of the I/O behavior of the application to predict how I/O interference affects the scalability of an application. We compare the model's predictions for several workloads (TeraSort, WordCount, PFP Growth and PageRank) against the actual behavior of those workloads in a real cluster environment, and confirm that our model can provide highly accurate predictions.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.