Automatic Prefetch and Modulo Scheduling Transformations for the Cell BE Architecture

Nikola Vujic,Marc Gonzalez,Eduard Ayguade,Xavier Martorell

doi:10.1109/tpds.2009.97

Abstract

Ease of programming is one of the main requirements for the broad acceptance of multicore systems without hardware support for transparent data transfer between local and global memories. Software cache is a robust approach to provide the user with a transparent view of the memory architecture; but this software approach can suffer from poor performance. In this paper, we propose a hierarchical, hybrid software-cache architecture that targets enabling prefetch techniques. Memory accesses are classified at compile time into two classes: high locality and irregular. Our approach then steers the memory references toward one of two specific cache structures optimized for their respective access pattern. The specific cache structures are optimized to enable high-level compiler optimizations to aggressively unroll loops, reorder cache references, and/or transform surrounding loops so as to practically eliminate the software-cache overhead in the innermost loop. The cache design enables automatic prefetch and modulo scheduling transformations. Performance evaluation indicates that optimized software-cache structures combined with the proposed prefetch techniques translate into speedup between 10 and 20 percent. As a result of the proposed technique, we can achieve similar performance on the Cell BE processor as on a modern server-class multicore such as the IBM PowerPC 970MP processor for a set of parallel NAS applications.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Automatic Prefetch and Modulo Scheduling Transformations for the Cell BE Architecture

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Parallel and Distributed Systems

Lead the way for us

Journal: IEEE Transactions on Parallel and Distributed Systems	Publication Date: Apr 1, 2010
Citations: 21

Similar Papers

Hybrid access-specific software cache techniques for the cell BE architecture
Marc Gonzàlez ... Alexandre E Eichenberger
-
Marc Gonzàlez, et. al.Marc Gonzàlez ... Alexandre E Eichenberger
25 Oct 2008
25 Oct 2008

Automatic Pre-Fetch and Modulo Scheduling Transformations for the Cell BE Architecture
Nikola Vujić ... Xavier Martorell
-
Nikola Vujić, et. al.Nikola Vujić ... Xavier Martorell
01 Jan 2008
01 Jan 2008

Adaptive and Speculative Memory Consistency Support for Multi-core Architectures with On-Chip Local Memories
Nikola Vujic ... Eduard Ayguadé
-
Nikola Vujic, et. al.Nikola Vujic ... Eduard Ayguadé
01 Jan 2009
01 Jan 2009

Orchestrating data transfer for the cell/B.E. processor
Tong Chen ... Tao Zhang
-
Tong Chen, et. al.Tong Chen ... Tao Zhang
07 Jun 2008
07 Jun 2008

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Automatic Prefetch and Modulo Scheduling Transformations for the Cell BE Architecture

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Parallel and Distributed Systems