Abstract

We consider the problem of finding an allocation of program modules onto processors of a distributed computing system such that the reliability of successfully executing these modules is maximized. The distributed system consists of a number of processors interconnected by means of communication links. Certain constraints such as storage and load limits may be present at each processor. At any point of time, each component of the distributed system (processor or communication link) can exist in either of two states — operational or failed. The probability of a component being operational is given. A program module can be executed on any one of a set of processors. For execution, it requires access to certain data files. If a particular file it requires is not available locally, it has to access the file remotely, and for the remote access to be possible, at least one path (sequence of links and processors) from the processor at which the program module is executing, to one of the processors where the required file is available, must be operational. To improve reliability, there may be multiple copies of certain files, dispersed at various processors. Our aim is to allocate the program modules to processors in a manner that maximizes the probability of it being able to successfully access all the files it requires for execution, and the allocation should not violate any of the constraints. This problem is known to be NP-hard. We use a state space search technique — the A ∗ algorithm to obtain an optimal allocation. We also present a heuristic algorithm which obtains sub-optimal allocations in a reasonable amount of computation time. Through simulations over a wide range of parameters, we demonstrate the effectiveness of our approach.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call