Abstract
Clusters of workstations are being extensively used for solving computationally intensive scientific problems. However, there is limited support for quality of service (QoS) based distributed computing on commercial off- the-shelf (COTS) clusters. This limitation has restricted successful deployment of distributed real-time high-performance computing applications to customized and dedicated embedded multi-processor systems. This paper describes research work that attempts to provide a cluster platform that can guarantee access to computational and communication resources to distributed applications. The authors have developed PromisQoS, an architecture that supports execution of hard real-time distributed applications on a Linux cluster while providing high-throughput and low-latency communication using Myrinet. PromisQoS consists of the following major components - Hare, BDM-RT and Turtle. Hare is a prototype implementation of time-based QoS channels specified by the real-time message passing interface (MPI/RT 1.1) standard. BDM-RT is a low-level messaging library on Myrinet that provides deterministic communication latency and bandwidth on Myrinet. Turtle, a variant of RT-Linux, is the real-time operating system that provides guaranteed computation time. This work demonstrates that it is possible to deploy hard real-time distributed applications on COTS clusters and underlines the significance of the MPI/RT API in the realm of distributed high-performance computing applications that require QoS.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have