A memory access model for highly-threaded many-core architectures

Lin Ma,Kunal Agrawal,Roger D Chamberlain

doi:10.1016/j.future.2013.06.020

Lin Ma, Kunal Agrawal + Show 1 more

Open Access

https://doi.org/10.1016/j.future.2013.06.020

Copy DOI

Abstract

A number of highly-threaded, many-core architectures hide memory-access latency by low-overhead context switching among a large number of threads. The speedup of a program on these machines depends on how well the latency is hidden. If the number of threads were infinite, theoretically, these machines could provide the performance predicted by the PRAM analysis of these programs. However, the number of threads per processor is not infinite, and is constrained by both hardware and algorithmic limits. In this paper, we introduce the Threaded Many-core Memory (TMM) model which is meant to capture the important characteristics of these highly-threaded, many-core machines. Since we model some important machine parameters of these machines, we expect analysis under this model to provide a more fine-grained and accurate performance prediction than the PRAM analysis. We analyze 4 algorithms for the classic all pairs shortest paths problem under this model. We find that even when two algorithms have the same PRAM performance, our model predicts different performance for some settings of machine parameters. For example, for dense graphs, the dynamic programming algorithm and Johnson’s algorithm have the same performance in the PRAM model. However, our model predicts different performance for large enough memory-access latency and validates the intuition that the dynamic programming algorithm performs better on these machines. We validate several predictions made by our model using empirical measurements on an instantiation of a highly-threaded, many-core machine, namely the NVIDIA GTX 480.

Highlights

A Memory Access Model for Highly-threaded Many-core ArchitecturesMany-core architectures are excellent in hiding memory-access latency by low-overhead context switching among a large number of threads
Highly-threaded, many-core devices such as GPUs have gained popularity in the last decade; both NVIDIA and AMD manufacture general purpose GPUs that fall in this category
Many-core architectures are excellent in hiding memory-access latency by low-overhead context switching among a large number of threads

Summary

A Memory Access Model for Highly-threaded Many-core Architectures

Many-core architectures are excellent in hiding memory-access latency by low-overhead context switching among a large number of threads. If the number of threads were infinite, theoretically these machines should provide the performance predicted by the PRAM analysis of the programs. The number of allowable threads per processor is not infinite. We introduce the Threaded Many-core Memory (TMM) model which is meant to capture the important characteristics of these highly-threaded, many-core machines. Follow this and additional works at: https://openscholarship.wustl.edu/cse_research Part of the Computer Engineering Commons, and the Computer Sciences Commons. Recommended Citation Ma, Lin; Agrawal, Kunal; and Chamberlain, Roger D., "A Memory Access Model for Highly-threaded Manycore Architectures" Report Number: WUCSE-2012-64 (2012). This technical report is available at Washington University Open Scholarship: https://openscholarship.wustl.edu/ cse_research/89

Complete Abstract

INTRODUCTION

MODELING

Many-core Architectures

TMM Model Parameters

TMM Analysis structure

ANALYSIS OF ALL PAIRS SHORTEST PATHS ALGORITHMS USING TMM MODEL

Floyd-Warshall Algorithm

Johnson’s Algorithm

3: Input: S is source vertex 4

COMPARISON OF THE VARIOUS ALGORITHMS

Influence of Machine Parameters

Influence of Graph Size

Vertices Fit in Local Memory

Edges Fit in the Combined Local Memories

CONCLUSION

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Future Generation Computer Systems	Publication Date: Jul 15, 2013
Citations: 42	License type: cc-by

R Discovery Prime

R Discovery Prime

A memory access model for highly-threaded many-core architectures

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Future Generation Computer Systems

Lead the way for us

Similar Papers

A Memory Access Model for Highly-threaded Many-core Architectures
Lin Ma ... Kunal Agrawal
-
Lin Ma, et. al.Lin Ma ... Kunal Agrawal
01 Dec 2012
01 Dec 2012

Performance Evaluation and Tuning of 2D Jacobi Iteration on Many-Core Machines
Zhengxiong Hou ... Christian Perez
-
Zhengxiong Hou, et. al.Zhengxiong Hou ... Christian Perez
16 Sep 2013
16 Sep 2013

XMalloc: A Scalable Lock-free Dynamic Memory Allocator for Many-core Machines
Xiaohuang Huang ... Christopher I Rodrigues
-
Xiaohuang Huang, et. al.Xiaohuang Huang ... Christopher I Rodrigues
01 Jun 2010
01 Jun 2010

MCMalloc: A scalable memory allocator for multithreaded applications on a many-core shared-memory machine
Akira Umayabara ... Hayato Yamana
-
Akira Umayabara, et. al.Akira Umayabara ... Hayato Yamana
01 Dec 2017
01 Dec 2017

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A memory access model for highly-threaded many-core architectures

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Future Generation Computer Systems