Guest Editorial: Special Section on Algorithm Design and Scheduling Techniques (Realistic Platform Models) for Heterogeneous Clusters

H Casanova,H.J Siegel,Y Robert

doi:10.1109/tpds.2006.25

Abstract

THE last decade has seen a dramatic increase in the deployment of heterogeneous distributed computing platforms, in particular, those consisting of heterogeneous clusters, and multiple heterogeneous collections of clusters aggregated over wide-area networks into grids. The software infrastructures and mechanisms to deploy such platforms have been well studied and implementations are already used in production, so that heterogeneous platforms represent a significant, and growing, fraction of the computational power delivered by parallel platforms today. In spite of these successes, many research challenges remain, including those pertaining to distributed algorithms and scheduling algorithms, which are critical for ensuring that these platforms are used effectively. In this context, the goal of this special section on “Algorithm Design and Scheduling Techniques (Realistic Platform Models) for Heterogeneous Clusters” is to gather papers that further our understanding of the impact of platform heterogeneity on the design and evaluation of new such algorithms. In the paper entitled “Allocating Non-Real-Time and Soft Real-Time Jobs in Multiclusters,” Ligang He, Stephen A. Jarvis, Daniel P. Spooner, Hong Jiang, Donna N. Dillenberger, and Graham R. Nudd introduce two workload allocation strategies for large-scale heterogeneous platforms. The first strategy achieves an optimized mean response time for jobs having no real-time requirements. The second strategy obtains an optimized mean miss rate for jobs having soft real-time requirements (i.e., a fraction of jobs are permitted to miss the real-time constraints). Both strategies take into account average system behaviors (such as the mean arrival rate of jobs) to calculate the workload proportions for individual clusters, and update on-the-fly the workload allocation when the change in the mean arrival rate reaches a certain threshold. The allocation schemes are combined with two job dispatching strategies (weighted random and weighted round-robin) to generate new job scheduling algorithms for multicluster environments. In their paper “On the Distribution of Sequential Jobs in Random Brokering for Heterogeneous Computational Grids,” Vandy Berten, Joel Goossens, and Emmanuel Jeannot study resource brokering for scheduling sequential jobs onto a grid platform that consists of heterogeneous sets of homogeneous processors, such as a set of clusters. Resources in each cluster are managed by a local scheduler that maintains a job queue. The paper studies a centralized “metascheduler” that uses a randomized strategy to share available resources among competing jobs. This research considers two cases depending on whether the platform is heavily loaded or lightly loaded. For each case, it obtains both analytical and experimental characterizations of the queue lengths at each local scheduler, CPU utilization, and average job slowdowns. Furthermore, the paper presents a discussion of the system’s behavior when it transitions between a heavily loaded state and a lightly loaded one. All presented theoretical results are corroborated by simulations and provide a thorough description of randomized resource brokering. The research in “Multiple Job Scheduling in a Connection-Limited Data Parallel System” presents a new method for scheduling jobs in a distributed system where the critical resource is the bandwidth to access the stored data. The authors, Alessandro Amoroso and Keith Marzullo, describe an approach that supports the master-worker scheme and can be applied to data parallel computation. They consider a typical wide-area data grid that is comprised of a set of sites, where each site has one or more local area networks. The platform model used is based on the Nile data grid. This paper uses a set of synthetic jobs to compare three schedulers: Greedy, Maxfow, and Hybrid. They tested their new approach under various circumstances and measured its performance by means of several metrics. The new Hybrid scheduler is never worse than either of the other two schedulers, and in 20 percent of the simulated runs, it produced runs that were at least 20 percent better. The paper entitled “Capacity-Aware Multicast Algorithms on Heterogeneous Overlay Networks,” coauthored by Zhan Zhang, Shigang Chen, Yibei Ling, and Randy Chow, addresses the problem of multicast for group IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 17, NO. 2, FEBRUARY 2006 97

Full Text