Abstract

Several multithreading techniques have been proposed to reduce the resource underutilization in Very Long Instruction Word (VLIW) processors. Simultaneous MultiThreading (SMT) is a popular technique which improves processor performance by issuing multiple instructions from different threads. SMT requires extra hardware to merge instructions from different threads. The complexity of this hardware increases substantially with the number of threads, limiting the number of threads that can be realistically supported to only 2. Cluster-level Simultaneous MultiThreading (CSMT) is a technique that merges instructions from threads at the cluster level. CSMT has a much lower merging hardware cost and can support a larger number of threads. However, CSMT performance is lower than SMT. In this paper, we evaluate several hardware designs that can support a high number of threads by using a merging scheme that combines both SMT and CSMT merging. For instance, one of the evaluated schemes, which merges the first 2 threads using SMT and the produced merging with other 2 threads by CSMT, achieves performance similar to supporting 4 threads by SMT but maintaining a reasonable merging hardware cost.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call