Abstract

Elastic Hadoop applications consisting of multiple virtual machines (VMs) are widely used to support big data analysis and processing. In this scenario, flash-based solid state drive (SSD) is usually deployed on hypervisors and used as the cache to improve the IO performance. However, existing SSD caching schemes are mostly VM-centric, which focus on the low-level IO performance metrics of individual VMs. They may not lead to the optimized performance of elastic Hadoop applications, i.e., the job completion time (JCT), as the importance of VMs inside the application are different even though they have the similar low-level IO patterns. Considering the IO dependency among VMs and figuring out the importance, which we regard as the application-centric metrics, may potentially better improve the performance. We present IO dependency based requirement model, to characterize the requirement of SSD cache for each VM inside the elastic Hadoop application, and then use it in a genetic algorithm (GA) based approach to calculate the nearly optimal weights of VMs for allocating the per-VM SSD cache space and the capacity of the I/O operations per second (IOPS). Furthermore, we present a tool AC-SSD based on the approach and introduce the closed-loop adaptation to react to continuously changing workloads. The evaluation shows that by using AC-SSD, the JCT is reduced by up to 39% for IO sensitive workloads, up to 29% for continuously changing workloads, and over 12.5% for different scale of data comparing to the shared cache.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.