SLITS: Sparsity-Lightened Intelligent Thread Scheduling

Wangkai Jin,Xiangjun Peng

doi:10.1145/3579436

Abstract

A diverse set of scheduling objectives (e.g., resource contention, fairness, priority, etc.) breed a series of objective-specific schedulers for multi-core architectures. Existing designs incorporate thread-to-thread statistics at runtime, and schedule threads based on such an abstraction (we formalize thread-to-thread interaction as the Thread-Interaction Matrix). However, such an abstraction also reveals a consistently-overlooked issue: the Thread-Interaction Matrix (TIM) is highly sparse. Therefore, existing designs can only deliver sub-optimal decisions, since the sparsity issue limits the amount of thread permutations (and its statistics) to be exploited when performing scheduling decisions. We introduce Sparsity-Lightened Intelligent Thread Scheduling (SLITS), a general scheduler design for mitigating the sparsity issue of TIM, with the customizability for different scheduling objectives. SLITS is designed upon the key insight that: the sparsity issue of the TIM can be effectively mitigated via advanced Machine Learning (ML) techniques. SLITS has three components. First, SLITS profiles Thread Interactions for only a small number of thread permutations, and form the TIM using the run-time statistics. Second, SLITS estimates the missing values in the TIM using Factorization Machine (FM), a novel ML technique that can fill in the missing values within a large-scale sparse matrix based on the limited information. Third, SLITS leverages Lazy Reschedule, a general mechanism as the building block for customizing different scheduling policies for different scheduling objectives. We show how SLITS can be (1) customized for different scheduling objectives, including resource contention and fairness; and (2) implemented with only negligible hardware costs. We also discuss how SLITS can be potentially applied to other contexts of thread scheduling. We evaluate two SLITS variants against four state-of-the-art scheduler designs. We highlight that, averaged across 11 benchmarks, SLITS achieves an average speedup of 1.08X over the de facto standard for thread scheduler - the Completely Fair Scheduler, under the 16-core setting for a variety of number of threads (i.e., 32, 64 and 128). Our analysis reveals that the benefits of SLITS are credited to significant improvements of cache utilization. In addition, our experimental results confirm that SLITS is scalable and the benefits are robust when of the number of threads increases. We also perform extensive studies to (1) break down SLITS components to justify the synergy of our design choices, (2) examine the impacts of varying the estimation coverage of FM, (3) justify the necessity of Lazy Reschedule rather than periodic rescheduling, and (4) demonstrate the hardware overheads for SLITS implementations can be marginal (<1% chip area and power).

Full Text