Topic 7 Parallel Computer Architecture and ILP

Theo Ungerer,Josep-Lluis Larriba-Pey,Pedro Trancoso,Kevin Skadron

doi:10.1007/11549468_55

Abstract

AbstractWe welcome you to the Parallel Computer Architecture and Instruction Level Parallelism sessions of Euro-Par 2005 conference being held in Lisboa, Portugal.Instruction Level Parallelism (ILP) and parallel processing techniques are present in most contemporary computing systems as they are very important and growing research fields. ILP research aims to extract fine-grained parallelism as well as thread-level parallelism not only from scientific code, but also from irregular, general code.The scope of this topic includes parallel computer architectures, processor architecture and microarchitecture, the impact of emerging microprocessor architectures on parallel computer architectures, innovative memory designs to hide and reduce the access latency, multi-threading, and the impact of emerging applications on parallel computer architecture design.This year 39 papers were submitted to this topic area. The majority of the papers came from the area of processor architecture and relatively few came from parallel systems. Among the submissions, 10 papers were accepted as full papers for the conference (26% acceptance rate). We are grateful to our referees for lending us their expertise and providing rigorous reviews. The accepted papers are grouped in three sessions according to the topic covered: Branch Prediction and Memory Hierarchy, Instruction Level Parallelism, and Parallel and Reconfigurable Architectures.In the first session Monchiero and Palermo present the Combined Perceptron Branch Predictor, which consists of two concurrent perceptron-like neural networks. Moure et al. propose a mechanism, Target Encoding, that achieves a better ratio between the predictor accuracy and its size. Shi and Lee propose an efficient solution to scale the L1 cache based on the register-guided dynamic partition of memory reference instructions for partitioned L1 data cache. And Canal et al. present a scheme that compresses all values passing through a processor in order to reduce the energy consumption.In the second session Zmily et al. introduce a block-aware ISA that helps accurate instruction delivery improving the energy consumption over traditional and decoupled front-ends. Sharky and Ponomarev propose a non-uniform instruction scheduler that achieves smaller scheduling delays. The same author also present an efficient wakeup-free instruction scheduler – instruction recirculation.Finally, the third session starts with a work by Almasi et al. describing the early experiments on a 16384 node BlueGene/L. Vandeputte et al. analyze and improve the performance of state-of-the-art phase predictors, which are useful for hardware adaptation. Bardisa et al. present a lightweight directory architecture.

Full Text