A fast and accurate method for determining a lower bound on execution time

G Fursin,M F P O'Boyle,O Temam,G Watts

doi:10.1002/cpe.774

Abstract

AbstractIn performance critical applications, memory latency is frequently the dominant overhead. In many cases, automatic compiler‐based optimizations to improve memory performance are limited and programmers frequently resort to manual optimization techniques. However, this process is tedious and time‐consuming. Furthermore, as the potential benefit from optimization is unknown there is no way to judge the amount of effort worth expending, nor when the process can stop, i.e. when optimal memory performance has been achieved or sufficiently approached. Architecture simulators can provide such information but designing an accurate model of an existing architecture is difficult and simulation times are excessively long. In this article, we propose and implement a technique that is both fast and reasonably accurate for estimating a lower bound on execution time for scientific applications. This technique has been tested on a wide range of programs from the SPEC benchmark suite and two commercial applications, where it has been used to guide a manual optimization process and iterative compilation. We compare our technique with that of a simulator with an ideal memory behaviour and demonstrate that our technique provides comparable information on memory performance and yet is over two orders of magnitude faster. We further show that our technique is considerably more accurate than hardware counters. Copyright © 2004 John Wiley & Sons, Ltd.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

A fast and accurate method for determining a lower bound on execution time

Abstract

Talk to us

Similar Papers

More From: Concurrency and Computation: Practice and Experience

Lead the way for us

Journal: Concurrency and Computation: Practice and Experience	Publication Date: Jan 7, 2004
Citations: 32

Similar Papers

Tuning linear algebra for energy efficiency on multicore machines by adapting the ATLAS library
Thomas Jakobs ... Paul Stöcker
Future Generation Computer Systems | VOL. 82
Thomas Jakobs, et. al.Thomas Jakobs ... Paul Stöcker
14 Mar 2017
Future Generation Computer Systems | VOL. 82

Performance Analysis of NAS and SAN Storage for Scientific Workflow
Amol Jaikar ... Sangwook Bae
-
Amol Jaikar, et. al.Amol Jaikar ... Sangwook Bae
01 Feb 2016
01 Feb 2016

L3C Model of High-Performance Computing Cluster for Scientific Applications
Alpana Rajan ... Brijendra Kumar Joshi
-
Alpana Rajan, et. al.Alpana Rajan ... Brijendra Kumar Joshi
01 Jan 2018
01 Jan 2018

Using Padding to Optimize Locality in Scientific Applications
E Herruzo ... O Plata
-
E Herruzo, et. al.E Herruzo ... O Plata
01 Jan 2008
01 Jan 2008

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A fast and accurate method for determining a lower bound on execution time

Abstract

Talk to us

Similar Papers

More From: Concurrency and Computation: Practice and Experience