Chip Multithreading Research Articles

Irregular and dynamic applications, such as graph problems and agent-based simulations, often require fine-grained parallelism to achieve good performance. However, current multicore processors only provide architectural support for coarse-grained parallelism, making it necessary to use software-based multithreading environments to effectively implement fine-grained parallelism. Although these software-based environments have demonstrated superior performance over heavyweight, OS-level threads, they are still limited by the significant overhead involved in thread management and synchronization. In order to address this, we propose a Lightweight Chip Multi-Threaded (LCMT) architecture that further exploits thread-level parallelism (TLP) by incorporating direct architectural support for an “unlimited” number of dynamically created lightweight threads with very low thread management and synchronization overhead. The LCMT architecture can be implemented atop a mainstream architecture with minimum extra hardware to leverage existing legacy software environments. We compare the LCMT architecture with a Niagara-like baseline architecture. Our results show up to 1.8X better scalability, 1.91X better performance, and more importantly, 1.74X better performance per watt, using the LCMT architecture for irregular and dynamic benchmarks, when compared to the baseline architecture. The LCMT architecture delivers similar performance to the baseline architecture for regular benchmarks.

The second in the Niagara series of processors (Niagara2) from Sun Microsystems is based on the power-efficient chip multi-threading (CMT) architecture optimized for Space, Watts (Power), and Performance (SWaP) [SWap Rating = Performance/(Space <sub xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">*</sub> Power) ]. It doubles the throughput performance and performance/watt, and provides >10times improvement in floating point throughput performance as compared to UltraSPARC T1 (Niagara1). There are two 10 Gb Ethernet ports on chip. Niagara2 has eight SPARC cores, each supporting concurrent execution of eight threads for 64 threads total. Each SPARC core has a floating point and graphics unit and an advanced cryptographic unit which provides high enough bandwidth to run the two 10 Gb Ethernet ports encrypted at wire speeds. There is a 4 MB Level2 cache on chip. Each of the four on-chip memory controllers controls two FBDIMM channels. Niagara2 has 503 million transistors on a 342 mm <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sup> die packaged in a flip-chip glass ceramic package with 1831 pins. The chip is built in Texas Instruments' 65 nm 11LM triple-Vt CMOS process. It operates at 1.4 GHz at 1.1 V and consumes 84 W.

Chip Multithreading Research Articles

Related Topics

Articles published on Chip Multithreading

Lightweight Chip Multi-Threading (LCMT): Maximizing Fine-Grained Parallelism On-Chip

Architecture and Physical Implementation of a Third Generation 65 nm, 16 Core, 32 Thread Chip-Multithreading SPARC Processor

A continuation-based noninterruptible multithreading processor architecture

Implementation of an 8-Core, 64-Thread, Power-Efficient SPARC Server on a Chip

A Power-Efficient High-Throughput 32-Thread SPARC Processor

A chip multithreaded processor for network-facing workloads

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Chip Multithreading Research Articles

Related Topics

Articles published on Chip Multithreading

Lightweight Chip Multi-Threading (LCMT): Maximizing Fine-Grained Parallelism On-Chip

Architecture and Physical Implementation of a Third Generation 65 nm, 16 Core, 32 Thread Chip-Multithreading SPARC Processor

A continuation-based noninterruptible multithreading processor architecture

Implementation of an 8-Core, 64-Thread, Power-Efficient SPARC Server on a Chip

A Power-Efficient High-Throughput 32-Thread SPARC Processor

A chip multithreaded processor for network-facing workloads