We consider the problem of processing a given number of tasks on a given number of processors as quickly as possible when only vague information about the processing time of a task is available before it is completed. Whenever a processor is idle, it can be assigned, at the price of a certain overhead, a portion, called a chunk, of the unassigned tasks. The goal is to minimize the makespan, that is, the time that passes until all the tasks are completed. The difficulty then is to find the optimal tradeoff between the processors' load balance, which is favoured by having small, and therefore many, chunks, and the total scheduling overhead, which is lower when there are fewer chunks. This scheduling problem has been the subject of intensive research in the past, and a large variety of heuristics have been proposed. Its mathematical analysis, however, turned out to be difficult even for simplistic models of the vague-information issue, and little theoretical work has been presented to date. In this work we present a novel theoretical model that covers a multitude of natural vague-information scenarios, and for which we can prove general upper and lower bounds on the achievable makespan. From this we derive optimal bounds and algorithms for a whole variety of specific scenarios, including the modelling of task processing times as independent, identically distributed random variables, which guided the design of most of the previously existing heuristics. Unlike traditional approaches, our model neither ignores a priori knowledge of the input (the processing times) nor does it restrict the distribution of the input, but instead works with the concepts of an a priori estimate of the processing times, which is implicit in every algorithm, and a measure for the deviation of this estimate from the actual processing times, which is not known until all the tasks are completed.
Read full abstract