Abstract

Simultaneous Multi-Threading (SMT) is a processor design technique that supports concurrent execution of instructions from multiple threads in every cycle by sharing the key datapath components. In the SMT architecture, the shared resources normally include the physical register file, Issue Queue (IQ), functional units, write buffer and the cache memory. Efficient utilization of the shared resources is critical to achieving high-performance gain. The physical rename register file is one of the most critical shared resources in the SMT architecture due to its being located at forefront of the pipeline stages. The inter-thread sharing of the physical registers reduces the number of registers required in the SMT processors than would have been needed in deploying multiple superscalar processors to achieve a similar throughput. However, due to the nature of sharing, an overwhelming occupancy of the physical register file by any slower threads can lead to a shortage of registers available for the other threads in the system and thus degrade the overall performance. In this paper, we propose an intelligent fetching algorithm for efficient management of the shared physical register file. Even though the primary focus of this paper is to manage the physical register file effectively, it indirectly controls the other shared resources downstream in the pipeline as well. The main goal of this paper is to propose a simple resource management scheme capable of achieving a considerable performance gain that neither incurs a substantial processing or hardware overhead for practical implementation nor requires modifications in the other pipeline stages. We demonstrate that temporarily suspending the slow threads from the system in the fetch stage can improve the overall system performance by a significant margin. An improvement of up to 63% and 68% is achieved when the proposed scheme is applied to the 4-threaded and the 8-threaded system respectively. The throughput of an 8-threaded system with 320 register file entries is significantly higher than the performance of default system with 416 register entries indicating a resource saving of 60%.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call