Abstract
We propose several schemes to improve the scalability, reduce the complexity and delays, and increase the throughput of dynamic scheduling in SMT processors. Our first design is an adaptation of the proposed instruction packing to SMT. Instruction packing opportunistically packs two instructions (possibly from different threads), each with at most one nonready source operand at the time of dispatch, into the same issue queue entry. Our second design, termed 2OP_BLOCK, takes these ideas one step further and completely avoids the dispatching of the instructions with two nonready source operands. This technique has several advantages. First, it reduces the scheduling complexity (and the associated delays) as the logic needed to support the instructions with two nonready source operands is eliminated. More surprisingly, 2OP_BLOCK simultaneously improves the performance as the same issue queue entry may be reallocated multiple times to the instructions with at most one nonready source (which usually spends fewer cycles in the queue) as opposed to hogging the entry with an instruction which enters the queue with two nonready sources. For, schedulers with the capacity to hold 64 instructions on a 4-way SMT, the 2OP_BLOCK design outperforms the traditional queue by 14 percent, on average, and at the same time results in a 10 percent reduction in the overall scheduling delay. We also present mechanisms to support speculative scheduling with 2OP_BLOCK and introduce the hybrid scheme that dynamically switches between 2OP_BLOCK and instruction packing modes depending on the workload characteristics, to achieve further performance gains
Submitted Version (Free)
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have