Abstract

CMOS evolution by lateral lithographic shrinkage has encountered an impediment in that wires do not scale well. As a result, it would appear that the clock race is over and the future of computing lies in multicore or parallel processing. In a prior paper [1] we have explored the implications of Amdahl's figure of merit (FOM), which suggests that for algorithms to successfully demonstrate large throughput improvement by parallelization, the fraction of non-parallelizable code (also called serial code) must be less than about 4%. We observed that memory latency, synchronization, and inter-processor communication latency can masquerade as non-parallelizable code. While there are no doubt certain applications where this non-parallelizable code fraction of less than 4% exists, and others where the large memory needed to just hold the data can justify use of parallel processors in any case, the implications for the broad class of other code are at best in doubt. The Amdahl figure of merit suggests worse, that a favorable impact is unpromising. In this paper we continue the dialogue begun in the earlier paper by pushing on to examine a small demonstration processor that accomplishes high performance by pursuing the traditional higher clock rate through improvements in device and interconnection technology. Because the processor uses a BiCMOS process and requires a 3D memory for Memory Wall mitigation, it is important to address thermal issues. Preliminary analyses are perhaps unexpectedly somewhat favorable.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call