Abstract

In today's scaled out systems, co-scheduling data analytics work with high priority user workloads is common as it utilizes better the vast hardware availability. User workloads are dominated by periodic patterns, with alternating periods of high and low utilization, creating promising conditions to schedule data analytics work during low activity periods. To this end, we show the effectiveness of machine learning models in accurately predicting user workload intensities, essentially by suggesting the most opportune time to co-schedule data analytics work. Yet, machine learning models cannot predict the effects of performance interference when co-scheduling is employed, as this constitutes a "new" observation. Specifically, in tiered storage systems, their hierarchical design makes performance interference even more complex, thus accurate performance prediction is more challenging. Here, we quantify the unknown performance effects of workload co-scheduling by enhancing machine learning models with queuing theory ones to develop a hybrid approach that can accurately predict performance and guide scheduling decisions in a tiered storage system. Using traces from commercial systems we illustrate that queuing theory and machine learning models can be used in synergy to surpass their respective weaknesses and deliver robust co-scheduling solutions that achieve high performance.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call