Abstract
Sequentially arriving jobs share a MapReduce cluster, each desiring a fair allocation of computing resources to serve its associated map and reduce tasks. The model of such a system consists of a processor sharing queue for the MapTasks and a multi-server queue for the ReduceTasks. These two queues are dependent through a constraint that the input data of each ReduceTask are fetched from the intermediate data generated by the MapTasks belonging to the same job. A more generalized form of MapReduce queueing model can capture the essence of other distributed data processing systems that contain interdependent processor sharing queues and multi-server queues. Through theoretical modeling and extensive experiments, we show that, this dependence, if not carefully dealt with, can cause non-work-conserving effects that negatively impact system performance and scalability. First, we characterize the heavy-traffic approximation. Depending on how tasks are scheduled, the number of jobs in the system can even exhibit jumps in diffusion limits, resulting in prolonged job execution times. This problem can be mitigated through carefully applying a tie-breaking rule for ReduceTasks, which as a theoretical finding has direct engineering implications. Second, we empirically validate a criticality phenomenon using experiments. MapReduce systems experience an undesirable performance degradation when they have reached certain critical points, another finding that offers fundamental guidance on managing MapReduce systems.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.