Clouds have been adopted widely by many organizations for their supports of flexible resource demands and low cost, which is normally achieved through sharing the underlying hardware among multiple cloud tenants. However, such sharing with the changes in resource contentions in virtual machines (VMs) can result in large variations for the performance of cloud applications, which makes it difficult for ordinary cloud users to estimate the run-time performance of their applications. In this article, we propose online learning methodologies for performance modeling and prediction of applications that run repetitively on multi-tenant clouds (such as on-line data analytic tasks). Here, a few micro-benchmarks are utilized to probe the in-situ perceivable performance of CPU, memory and I/O components of the target VM. Then, based on such profiling information and in-place measured application’s performance, the predictive models can be derived with either Regression or Neural-Network techniques. In particular, to address the changes in the intensity of resource contentions of a VM over time and its effects on the target application, we proposed <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">periodic model retraining</i> where the sliding-window technique was exploited to control the frequency and historical data used for model retraining. Moreover, a <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">progressive modeling</i> approach has been devised where the Regression and Neural-Network models are gradually updated for better adaptation to recent changes in resource contention. With 17 representative applications from PARSEC, NAS Parallel and CloudSuite benchmarks being considered, we have extensively evaluated the proposed online schemes for the prediction accuracy of the resulting models and associated overheads on both a private and public clouds. The evaluation results show that, even on the private cloud with high and radically changed resource contention, the average prediction errors of the considered models can be less than 20 percent with periodic retraining. The prediction errors generally decrease with higher retraining frequencies and more historical data points but incurring higher run-time overheads. Furthermore, with the neural-network progressive models, the average prediction errors can be reduced by about 7 percent with much reduced run-time overheads (up to 265 <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">X</i> ) on the private cloud. For public clouds with less resource contentions, the average prediction errors can be less than 4 percent for the considered models with our proposed online schemes.
Read full abstract