Abstract

This article describes and evaluates a small, out-of-order, simultaneous multithreaded SMT core architecture suitable for power constrained microprocessors, such as manycore microprocessors for high performance computing. The architecture does not require a reorder buffer ROB or physical registers for register renaming and instruction retirement. Instead, it uses a large number of virtual register IDs for register renaming, and a logical register file with multiple contexts. The architecture improves total thread execution throughput using two register contexts to support SMT execution of parallel workloads. Moreover, the architecture improves instruction level parallelism ILP and execution performance when running single-thread applications. In addition to eliminating the reorder buffer and the physical renaming register file, the architecture minimises the logical register file hardware by using the two SMT register contexts and in-cell register file context fusion mechanism for recovering from branch mispredictions. We present results from Spec 2006 benchmarks running on a SimpleScalar performance simulator of our architecture. Our simulation measurements show 5% single-thread performance improvement and 9.6% 2-thread SMT performance improvement over a conventional SMT core architecture with reorder buffer.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call