Abstract

Big data analytic technologies such as Hadoop and Spark run on compute clusters that are managed by resource managers such as YARN. YARN manages resources available to individual applications, thereby affecting job performance. Manual tuning of YARN tuning parameters can result in sub-optimal and brittle performance. Parameters that are optimal for one job may not be well suited to another. In this paper we present KERMIT, the first on-line automatic tuning system for YARN. KERMIT optimizes in real-time YARN memory and CPU allocations to individual YARN containers by analysing container response-time performance. Unlike previous automatic tuning methods for specific systems such as Spark or Hadoop, this is the first study that focuses on the more general case of on-line, real-time tuning of YARN container density and how this affects performance of applications running on YARN. KERMIT employs the same tuning code to automatically tune any system that uses YARN, including both Spark and Hadoop. The effectiveness of our technique was evaluated for Hadoop and Spark jobs using the Terasort, TPCx-HS, and SMB benchmarks. KERMIT was able to achieve an efficiency of more than 92% of the best possible tuning configuration (exhaustive search of the parameter space) and up to 30% faster than basic manual tuning.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.