IBM POWER8 Processor Research Articles

Loops take up most of the time of computer programs, so optimizing them so that they run in the shortest time possible is a continuous task. However, this task is not negligible; on the contrary, it is an open area of research since many irregular loops are hard to parallelize. Generally, these loops have loop-carried (DOACROSS) dependencies and the appearance of dependencies could depend on the context. Many techniques have been studied to be able to parallelize these loops efficiently; however, for example in the OpenMP standard there is no efficient way to parallelize them. This article presents Speculative Task Execution (STE), a technique that enables the execution of OpenMP tasks in a speculative way to accelerate certain hot-code regions (such as loops) marked by OpenMP directives. It also presents a detailed analysis of the application of Hardware Transactional Memory (HTM) support for executing tasks speculatively and describes a careful evaluation of the implementation of STE using HTM on modern machines. In particular, we consider the scenario in which speculative tasks are generated by the OpenMP taskloop construct (Speculative Taskloop (STL)). As a result, it provides evidence to support several important claims about the performance of STE over HTM in modern processor architectures. Experimental results reveal that: (a) by implementing STL on top of HTM for hot-code regions, speed-ups of up to 5.39× can be obtained in IBM POWER8 and of up to 2.41× in Intel processors using 4 cores; and (b) STL-ROT, a variant of STL using rollback-only transactions (ROTs), achieves speed-ups of up to 17.70× in IBM POWER9 processor using 20 cores.

Read full abstract

The first-level package that contains the IBM POWER9 processor chip is designed to achieve the high computational performance needed for cognitive systems in a cost-effective design. The throughput data bandwidth of the POWER9 package for high-end scale-up systems is more than 1 TB/s, which is double the data bandwidth of the previous generation. This increase in bandwidth is achieved by introducing a dielectric with a loss tangent of 40% of the predecessor material, a C4 density increase of 15%, higher number of stacked vias to reduce jogging, and improved via pattern and placement to increase the frequency and density of signals. The cloud platform scale-out POWER9 package leverages the high-end and cognitive platform package attributes to maintain signal frequency while introducing novel chip-package-system co-design techniques. These design techniques were used to produce a well-balanced two-socket entry-level package with four build-up layers above and below the core, instead of six, resulting in a significant cost reduction from the previous generation while supporting the signal frequencies of POWER9. POWER9 systems are the first to offer 16-Gb/s PCIe Gen4 and 25.8-Gb/s open coherent accelerator processor interface that interconnect the processor to the I/O, networking, and accelerators required for systems in the cognitive computing era. In this paper, we present the material and wiring technology needed to achieve the signal performance up to 25.8 Gb/s per channel, the package physical attributes, and the chip-package-system co-design methodology to achieve the increased signal density, minimize the crosstalk, and maximize the frequency while reusing the package form factors of the previous generation, IBM POWER8.

Read full abstract

IBM POWER8 Processor Research Articles

Related Topics

Articles published on IBM POWER8 Processor

Using hardware-transactional-memory support to implement speculative task execution

Thread Isolation to Improve Symbiotic Scheduling on SMT Multicore Processors

Functionality and performance of NVLink with IBM POWER9 processors

IBM POWER9 processor core

IBM POWER9 processor and system features for computing in the cognitive era

XIVE: External interrupt virtualization for the cloud infrastructure

IBM POWER9 package technology and design

IBM POWER9 memory architectures for optimized systems

Improving IBM POWER8 Performance Through Symbiotic Job Scheduling

IBM Power9 Processor Architecture

PATer: A Hardware Prefetching Automatic Tuner on IBM POWER8 Processor

Transactional memory support in the IBM POWER8 processor

The cache and memory subsystems of the IBM POWER8 processor

IBM POWER8 performance features and evaluation

Applications of the streamed storage format for sparse matrix operations

Concurrent Generation of Concurrent Programs for Post-Silicon Validation

IBM POWER7 processor circuit design

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

IBM POWER8 Processor Research Articles

Related Topics

Articles published on IBM POWER8 Processor

Using hardware-transactional-memory support to implement speculative task execution

Thread Isolation to Improve Symbiotic Scheduling on SMT Multicore Processors

Functionality and performance of NVLink with IBM POWER9 processors

IBM POWER9 processor core

IBM POWER9 processor and system features for computing in the cognitive era

XIVE: External interrupt virtualization for the cloud infrastructure

IBM POWER9 package technology and design

IBM POWER9 memory architectures for optimized systems

Improving IBM POWER8 Performance Through Symbiotic Job Scheduling

IBM Power9 Processor Architecture

PATer: A Hardware Prefetching Automatic Tuner on IBM POWER8 Processor

Transactional memory support in the IBM POWER8 processor

The cache and memory subsystems of the IBM POWER8 processor

IBM POWER8 performance features and evaluation

Applications of the streamed storage format for sparse matrix operations

Concurrent Generation of Concurrent Programs for Post-Silicon Validation

IBM POWER7 processor circuit design