Abstract

SummaryMapReduce has become a popular model for large‐scale data processing in recent years. Many works on MapReduce scheduling (e.g., load balancing and deadline‐aware scheduling) have emphasized the importance of predicting workload received by individual reducers. However, because the input characteristics and user‐specified map function of a given job are unknown to the MapReduce framework before the job starts, accurately predicting workload of reducers can be a difficult challenge. To address this challenge, we present ROUTE, a run‐time robust reducer workload estimation technique for MapReduce. ROUTE progressively samples the partition size of the early completed mappers, allowing ROUTE to perform estimation at run time yet fulfilling the accuracy requirement specified by users. Moreover, by using robust estimation and bootstrapping resampling techniques, ROUTE can achieve high applicability to a wide variety of applications. Through experiments using both real and synthetic data on an 11‐node Hadoop cluster, we show ROUTE can achieve high accuracy with error rate no more than 10.92% and an improvement of 40.6% in terms of error rate while compared with the state‐of‐the‐art solution. Besides, through simulations using synthetic data, we show that ROUTE is robust to a variety of skewed distributions. Finally, we apply ROUTE to existing load balancing and deadline‐aware scheduling frameworks and show ROUTE significantly improves the performance of these frameworks. Copyright © 2016 John Wiley & Sons, Ltd.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.