Abstract
As cloud computing continues to mature, IT managers have started concentrating on the support of additional performance requirements: quality of service and tailored resource allocation for achieving service performance goals. In this paper, we consider the popular Pig framework that provides a high-level SQL-like abstraction on top of MapReduce engine for processing large data sets. Programs written in such frameworks are compiled into directed acyclic graphs (DAGs) of MapReduce jobs. Often, data processing applications have to produce results by a certain time deadline. We design a performance modeling framework for Pig programs that solves two inter-related problems: (i) estimating the completion time of a Pig program as a function of allocated resources, (ii) estimating the amount of resources (a number of map and reduce slots) required for completing a Pig program with a given (soft) deadline. To achieve these goals, we first, optimize a Pig program execution by enforcing the optimal schedule of its concurrent jobs. This optimization reduces a program completion time (10%-27% in our experiments), and moreover, it eliminates possible non-determinism in the DAGs execution. Based on our optimization, we propose an accurate performance model for Pig programs. This approach leads to significant resource savings (20%-60% in our experiments) compared with the original, unoptimized solution. We validate our approach in a 66-node Hadoop cluster using two workload sets: TPC-H queries and a set of customized queries mining a collection of HP Labs' web proxy logs.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.