Abstract
A simulation technique for very large-scale data parallel programs is proposed. In our simulation method, a data parallel program is divided into computation and communication sections. When the control flow of the parallel program does not depend on the contents of network messages, the computation time on each processor is calculated independently. An instrumentation tool called EXCIT is used to calculate the execution time on the target architecture and generate message traces. The communication time is calculated on the message traces by using a network simulator, which is generated by a network simulator generating system INSPIRE. With our tool set, the behavior of parallel programs on thousands processors can be estimated within a practical time span. We demonstrate our method to analyze the class B problems of LU and MG programs of the NAS Parallel Benchmarks with various parameters such as cache size and network bandwidth examined. We found that communication overhead affects the total execution time considerably, while cache effect is small.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have