Abstract
There is an apparent conflict between the hardware requirements for fast parallel execution and the hardware requirements for fast serial execution. For example, fast vector execution is achieved by maintaining high execution concurrency over extended periods of time. With many operations executing in parallel, the time to carry out individual operations is much less important than the average execution concurrency.Fast serial execution, on the other hand, requires rapid execution of relatively few operations at a time; hardware concurrency can be sacrificed in favor of short execution times. Fewer registers and memory locations are required, but they must have shorter access times than for parallel execution.We show how to integrate these seemingly conflicting requirements into a single computer, using asymmetric distribution of hardware, and sometimes using software to allocate variables to appropriate parts of the storage hierarchy.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have