Single-chip Multiprocessor Research Articles

The instruction-level parallelism found in a conventional instruction stream is limited. Studies have shown the limits of processor utilization even for today's superscalar microprocessors. One solution is the additional utilization of more coarse-grained parallelism. The main approaches are the (single) chip multiprocessor and the multithreaded processor which optimize the throughput of multiprogramming workloads rather than single-thread performance. The chip multiprocessor integrates two or more complete processors on a single chip. Every unit of a processor is duplicated and used independently of its copies on the chip. In contrast, the multithreaded processor is able to pursue two or more threads of control in parallel within the processor pipeline. Unused instruction slots, which arise from pipelined execution of single-threaded programs by a contemporary microprocessor, are filled by instructions of other threads within a multithreaded processor. The execution units are multiplexed between the threads in the register sets. Underutilization of a superscalar processor due to missing instruction-level parallelism can be overcome by simultaneous multithreading, where a processor can issue multiple instructions from multiple threads each cycle. Simultaneous multithreaded processors combine the multithreading technique with a wide-issue superscalar processor such that the full issue bandwidth is utilized by potentially issuing instructions from different threads simultaneously. This survey paper explains and classifies the various multithreading techniques in research and in commercial microprocessors and compares multithreaded processors with chip multiprocessors.

Read full abstract

To achieve high performance, contemporary computer systems rely on two forms of parallelism: instruction-level parallelism (ILP) and thread-level parallelism (TLP). Wide-issue super-scalar processors exploit ILP by executing multiple instructions from a single program in a single cycle. Multiprocessors (MP) exploit TLP by executing different threads in parallel on different processors. Unfortunately, both parallel processing styles statically partition processor resources, thus preventing them from adapting to dynamically changing levels of ILP and TLP in a program. With insufficient TLP, processors in an MP will be idle; with insufficient ILP, multiple-issue hardware on a superscalar is wasted. This article explores parallel processing on an alternative architecture, simultaneous multithreading (SMT), which allows multiple threads to complete for and share all of the processor's resources every cycle.The most compelling reason for running parallel applications on an SMT processor is its ability to use thread-level parallelism and instruction-level parallelism interchangeably. By permitting multiple threads to share the processor's functional units simultaneously, the processor can use both ILP and TLP to accommodate variations in parallelism. When a program has only a single thread, all of the SMT processor's resources can be dedicated to that thread; when more TLP exists, this parallelism can compensate for a lack of per-thread ILP.We examine two alternative on-chip parallel architectures for the next generation of processors. We compare SMT and small-scale, on-chip multiprocessors in their ability to exploit both ILP and TLP. First, we identify the hardware bottlenecks that prevent multiprocessors from effectively exploiting ILP. Then, we show that because of its dynamic resource sharing, SMT avoids these inefficiencies and benefits from being able to run more threads on a single processor. The use of TLP is especially advantageous when per-thread ILP is limited. The ease of adding additional thread contexts on an SMT (relative to adding additional processors on an MP) allows simultaneous multithreading to expose more parallelism, further increasing functional unit utilization and attaining a 52% average speedup (versus a four-processor, single-chip multiprocessor with comparable execution resources). This study also addresses an often-cited concern regarding the use of thread-level parallelism or multithreading: interference in the memory system and branch prediction hardware.We find the multiple threads cause interthread interference in the caches and place greater demands on the memory system, thus increasing average memory latencies. By exploiting threading-level parallelism, however, SMT hides these additional latencies, so that they only have a small impact on total program performance. We also find that for parallel applications, the additional threads have minimal effects on branch prediction.

Read full abstract

Single-chip Multiprocessor Research Articles

Related Topics

Articles published on Single-chip Multiprocessor

Multithreaded Processors

Are single-chip multiprocessors in reach?

Architecture of the Atlas chip-multiprocessor: dynamically parallelizing irregular applications

A 250-MHz single-chip multiprocessor for audio and video signal processing

Architectural support for scalable speculative parallelization in shared-memory multiprocessors

A scalable approach to thread-level speculation

Resynchronization for multiprocessor DSP systems

Converting thread-level parallelism to instruction-level parallelism via simultaneous multithreading

A study on snoop cache systems for single-chip multiprocessors

Performance evaluation of a single‐chip digital signal processor based multimedia system using the Abingdon Cross benchmark

The case for a single-chip multiprocessor

The case for a single-chip multiprocessor

The microprocessor for scientific computing in the year 2000

Simultaneous multithreading

Exploring the design space for a shared-cache multiprocessor

A single-chip multiprocessor for multimedia: the MVP

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Single-chip Multiprocessor Research Articles

Related Topics

Articles published on Single-chip Multiprocessor

Multithreaded Processors

Are single-chip multiprocessors in reach?

Architecture of the Atlas chip-multiprocessor: dynamically parallelizing irregular applications

A 250-MHz single-chip multiprocessor for audio and video signal processing

Architectural support for scalable speculative parallelization in shared-memory multiprocessors

A scalable approach to thread-level speculation

Resynchronization for multiprocessor DSP systems

Converting thread-level parallelism to instruction-level parallelism via simultaneous multithreading

A study on snoop cache systems for single-chip multiprocessors

Performance evaluation of a single‐chip digital signal processor based multimedia system using the Abingdon Cross benchmark

The case for a single-chip multiprocessor

The case for a single-chip multiprocessor

The microprocessor for scientific computing in the year 2000

Simultaneous multithreading

Exploring the design space for a shared-cache multiprocessor

A single-chip multiprocessor for multimedia: the MVP