Parallel adaptive computing on meta-systems including NOWs

N Melab,E.-G Talbi

doi:10.1016/s0167-8191(99)00105-2

Abstract

Load analysis of meta-systems including NOWs or COWs has shown that only a few percentage of the available power is used during long periods of time. Therefore, in order to exploit the idle time when executing a parallel application work load must be sent to a machine as soon as the latter becomes available. Furthermore, in order to keep respected the ownership of workstations work has to be stopped and resumed later as soon as the machine executing it is requisitioned by its owner. As a consequence, users need an adaptive system allowing to return events related to the goings and comings of workstations. On the other hand, it is necessary to provide them a parallel adaptive programming methodology that plans the handling of these events. In this paper, we present the MARS (MARS: multi-user adaptive resource scheduler, developed at LIFL laboratory, Universit é de Lille I) system and its parallel adaptive programming methodology through the block-based Gauss–Jordan algorithm used in numerical analysis to invert large matrices. Moreover, we propose a work scheduling strategy and an application-oriented solution for the fault tolerance issue. Furthermore, we present some experimental results obtained on a DEC/ALPHA COW and a SUN/Sparc4 NOW. The results show that very high absolute efficiencies can be obtained if the size of the blocks is well chosen. We also present some experimentations related to the adaptability of the application in a meta-system including the DEC/ALPHA COW and the SUN/Sparc4 NOW. The results show that the management of the adaptability consumes just a short percentage of execution time.

Full Text