HAWS

Xun Gong,David Kaeli,Xiang Gong,Leiming Yu

doi:10.1145/3291050

Abstract

Graphics Processing Units (GPUs) have become an attractive platform for accelerating challenging applications on a range of platforms, from High Performance Computing (HPC) to full-featured smartphones. They can overcome computational barriers in a wide range of data-parallel kernels. GPUs hide pipeline stalls and memory latency by utilizing efficient thread preemption. But given the demands on the memory hierarchy due to the growth in the number of computing cores on-chip, it has become increasingly difficult to hide all of these stalls. In this article, we propose a novel Hint-Assisted Wavefront Scheduler (HAWS) to bypass long-latency stalls. HAWS starts by enhancing a compiler infrastructure to identify potential opportunities that can bypass memory stalls. HAWS includes a wavefront scheduler that can continue to execute instructions in the shadow of a memory stall, executing instructions speculatively, guided by compiler-generated hints. HAWS increases utilization of GPU resources by aggressively fetching/executing speculatively. Based on our simulation results on the AMD Southern Islands GPU architecture, at an estimated cost of 0.4% total chip area, HAWS can improve application performance by 14.6% on average for memory intensive applications.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

HAWS

Abstract

Talk to us

Similar Papers

More From: ACM Transactions on Architecture and Code Optimization

Lead the way for us

Journal: ACM Transactions on Architecture and Code Optimization	Publication Date: Apr 18, 2019
Citations: 10

Similar Papers

Hint-assisted scheduling on modern GPUs
Xun Gong
-
Xun GongXun Gong
10 May 2021
10 May 2021

A Massively Parallel Reservoir Simulator on the GPU Architecture
Abdulrahman Manea ... Maitham Alhubail
-
Abdulrahman Manea, et. al.Abdulrahman Manea ... Maitham Alhubail
19 Oct 2021
19 Oct 2021

Optimizations in GPU: Smart compilers and core-level reconfiguration
Deming Chen
-
Deming ChenDeming Chen
01 Jun 2013
01 Jun 2013

Accelerating genetic algorithms with GPU computing: A selective overview
John Runwei Cheng ... Mitsuo Gen
Computers & Industrial Engineering | VOL. 128
John Runwei Cheng, et. al.John Runwei Cheng ... Mitsuo Gen
29 Dec 2018
Computers & Industrial Engineering | VOL. 128

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

HAWS

Abstract

Talk to us

Similar Papers

More From: ACM Transactions on Architecture and Code Optimization