EM-KDE: A locality-aware job scheduling policy with distributed semantic caches

Youngmoon Eom,Deukyeon Hwang,Junyong Lee,Jonghwan Moon,Minho Shin,Beomseok Nam

doi:10.1016/j.jpdc.2015.06.002

Abstract

In modern query processing systems, the caching facilities are distributed and scale with the number of servers. To maximize the overall system throughput, the distributed system should balance the query loads among servers and also leverage cached results. In particular, leveraging distributed cached data is becoming more important as many systems are being built by connecting many small heterogeneous machines rather than relying on a few high-performance workstations. Although many query scheduling policies exist such as round-robin and load-monitoring, they are not sophisticated enough to both balance the load and leverage cached results. In this paper, we propose distributed query scheduling policies that take into account the dynamic contents of distributed caching infrastructure and employ statistical prediction methods into query scheduling policy.We employ the kernel density estimation derived from recent queries and the well-known exponential moving average (EMA) in order to predict the query distribution in a multi-dimensional problem space that dynamically changes. Based on the estimated query distribution, the front-end scheduler assigns incoming queries so that query workloads are balanced and cached results are reused. Our experiments show that the proposed query scheduling policy outperforms existing policies in terms of both load balancing and cache hit ratio.

Full Text