Hiding latency through bulk transfer and prefetching in distributed shared memory multiprocessors

Yangwoo Roh Yangwoo Roh,Daeyeon Park Daeyeon Park,Byeong Hag Seong Byeong Hag Seong

doi:10.1109/hpc.2000.846540

Abstract

Distributed shared memory (DSM) machines provide the shared memory paradigm and achieve high performance by the caching of shared data. However, they suffer from cache miss and remote access latency with coarse-grain patterns. In this paper we suggest the combination of bulk transfer and prefetching as a new latency hiding technique in DSM machines. The purpose of bulk transfer is to replicate remote data into local memory and thus reduce remote accesses. Adaptive granularity was used for bulk transfer. Prefetching is added to fetch replicated data to the cache at the right time. We could apply simple prefetch scheduling as in uniprocessors since bulk transfer converts remote accesses into local ones. Simulation results show the reduced latency and the potential of AG as a preferable architecture for prefetching in DSM machines.

Full Text