Abstract
A template algorithm for parallel execution of independent iterations of the repetitive loop on a multiprocessor computer with distributed memory is constructed. Regardless of the number of processors, the algorithm must provide efficient utilization of computing capacity under essentially different complexities of iterations and/or performance of processors. The interprocessor data communication and control of par� allel computations are assumed to be implemented using a standard messagepassing interface (MPI), which is widely used in such systems. Existing methods for the loop parallelization are analyzed and the correspond� ing efficiencies are empirically estimated for various models of iteration nonuniformity.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have