Abstract

The increase in processor speed achieved by continuous improvements in technology is causing major obstacles to the parallel processors implemented inside the chip. The time spent in servicing all the cache misses from all processors from a slow shared memory limits the performance gain of parallel processors. We propose a new memory system that makes all of its content available to processors, so that processors need not to access the shared memory in a serial fashion. Rather than having one processor access a single location in the shared memory at a time, we force each location to be available to all processors at a specific time. This new memory system is fast and simple, because it does not need decoders and can use the DRAM or SRAM technology efficiently as the access of each location is known ahead of time. Results show that this new memory improves a single processor performance by 350% and the performance of eight parallel processors by 2400%. The new memory decouples the slow memory from the fast processor and makes the parallel processors scalable to an infinite number of processors.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call