Abstract

The exploitation of instruction level parallelism in superscalar architectures is limited by data and control dependencies. Simultaneous multi-threaded (SMT) architectures can explore another level of parallelism, called thread-level parallelism, to fetch and execute instructions from different tasks at the same time. While a task is blocked by control or data dependencies, other tasks may continue executing, thus masking latencies caused by mispredicted branches and memory accesses, and increasing the occupation of functional units. However, the design of SMT architectures brings new challenges, such as determining the most efficient way to share resources among different threads. In this paper, we present different branch prediction topologies for SMT architectures. We show that the best results are obtained by matching the number of i-cache modules (fetch width) with the number of branch prediction modules (number of lookups and updates), while increasing the number of modules also helps increasing clock rates. Moreover, contention on branch prediction lookup and updates buses cannot be ignored on such architectures.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call