Abstract

One of the challenges to achieving good performance on multicore architectures is the effective utilization of the underlying memory hierarchy. While this is an issue for single-core architectures, it is a critical problem for multicore chips. In this paper, we formulate the unified multicore model (UMM) to help understand the fundamental limits on cache performance on these architectures. The UMM seamlessly handles different types of multiple-core processors with varying degrees of cache sharing at different levels. We demonstrate that our model can be used to study a variety of multicore architectures on a variety of applications. In particular, we use it to analyze an option pricing problem using the trinomial model and develop an algorithm for it that has near-optimal memory traffic between cache levels. We have implemented the algorithm on a two Quad-Core Intel Xeon 5310 1.6 GHz processors (8 cores). It achieves a peak performance of 19.5 GFLOPs, which is 38% of the theoretical peak of the multicore system. We demonstrate that our algorithm outperforms compiler-optimized and auto-parallelized code by a factor of up to 7.5.

Highlights

  • Processors with multiple cores are being manufactured by a number of vendors including IBM, Sun, Intel, AMD and Tilera

  • The multicore memory hierarchy game (MMHG) is a pebbling game played on a directed acyclic graphs (DAGs) that models computations done on the unified multicore model (UMM)

  • In this paper we introduce a new model for the cache hierarchy of multicore chips

Read more

Summary

Introduction

Processors with multiple cores are being manufactured by a number of vendors including IBM, Sun, Intel, AMD and Tilera. We design a multicore algorithm and analyze it on the model to determine the memory traffic at different levels of a memory hierarchy. We compare this performance with the derived lower bounds, and if the proposed algorithm is far away from the optimal we try to improve the algorithm and repeat this process. We derive lower bounds on memory traffic between different levels of hierarchy for financial and scientific computations.

Unified multicore model
The multicore memory hierarchy game
Uni-processor lower bounds
Multicore lower bounds
Option pricing using trinomial model
Multicore algorithm and implementation
Vanilla algorithm
Applications of the methodology
Findings
Conclusions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call