A multiprocessor architecture combining fine-grained and coarse-grained parallelism strategies

David J Lilja

doi:10.1016/0167-8191(94)90003-5

Abstract

A wide variety of computer architectures have been proposed that attempt to exploit parallelism at different granularities. For example, pipelined processors and multiple instruction issue processors exploit the fine-grained parallelism available at the machine instruction level, while shared memory multiprocessors exploit the coarse-grained parallelism available at the loop level. Using a register-transfer level simulation methodology, this paper examines the performance of a multiprocessor architecture that combines both coarse-grained and fine-grained parallelism strategies to minimize the execution time of a single application program. These simulations indicate that the best system performance is obtained by using a mix of fine-grained and coarse-grained parallelism in which any number of processors can be used, but each processor should be pipelined to a degree of 2 to 4, or each should be capable of issuing from 2 to 4 instructions per cycle. These results suggest that current high-performance microprocessors, which typically can have 2 to 4 instructions simultaneously executing, may provide excellent components with which to construct a multiprocessor system.

Full Text