Simultaneous Multithreading (SMT) architectures are proposed to better explore on-chip parallelism, which capture the essence of performance improvement in modern processors. SMT overcomes the limits in a single thread by fetching and executing from multiple of them in a shared fashion. The long-latency operations, however, still cause inefficiency in SMT processors. When instructions have to wait for data from lower-level memory hierarchy, the dependent instructions cannot proceed, hence continue occupying the shared resources on the chip for an extended number of clock cycles. This introduces undesired inter-thread interference in SMT processors, which further leads to negative impacts on overall system throughput and average thread performance. In practice, instruction fetch policies take the responsibility of assigning thread priority at the fetch stage, in an effort to better distribute the shared resources among threads in the same core to cope with the long-latency operations and other runtime behavior from the thread for better performance.In this paper we propose an instruction fetch policy RUCOUNT, which considers resource utilization of individual thread in the prioritization process. The proposed policy observes instructions in the front-end stages of the pipeline as well as low-level data misses to summarize the resource utilization for thread management. Higher priority is granted to the thread(s) with less utilized resources, such that overall resources are distributed more efficiently in SMT processors. As a result, it has two unique features compared to other policies: one is to observe the hardware resource comprehensively and the other is to monitor limited resource entries. Our experimental results demonstrate that RUCOUNT is 20% better than ICOUNT, 10% than Stall, 8% than DG and 3% than DWarn, in terms of averaged performance. Considering its hardware overhead is at the similar level as ICOUNT and DWarn, our proposed instruction fetch policy RUCOUNT is superior among the studied policies.
Read full abstract