A data parallel language such as C * has a number of advantages over conventional hypercube programming languages. The algorithm design process is simpler, because (1) message passing is invisible, (2) race conditions are nonexistent, and (3) the data can be put into a one-to-one correspondence with the virtual processors. Since data are mapped to virtual processors, rather than physical processors, it is easier to move algorithms implemented on one size hypercube to a larger or smaller system. We outline the design of a C * compiler for a hypercube multicomputer. Our design goals are to minimize the amount of time spent synchronizing, limit the number of interprocessor communications, and make each physical processor's emulation of a set of virtual processors as efficient as possible. We have hand translated three benchmark programs and compared their performance with that of ordinary C programs. All three programs—matrix multiplication, LU decomposition, and hyperquicksort—achieve reasonable speedup on a commercial hypercube, even when solving problems of modest size. On a 64-processor NCUBE/7, the C * matrix multiplication program achieves a speedup of 27 when multiplying two 64 × 64 matrices, the hyperquicksort program achieves a speedup of 10 when sorting 16,384 integers, and LU decomposition attains a speedup of 7 when decomposing a 256 × 256 system of linear equations. We believe the degradation in machine performance resulting from the use of a data parallel language will be more than compensated for by the increase in programmer productivity.
Read full abstract