High-performance Graphics Processing Units Research Articles

Recently graphic processing units (GPUs) are rising as a new vehicle for high-performance, general purpose computing. It is attractive to unleash the power of GPU for Electronic Design Automation (EDA) computations to cut the design turn-around time of VLSI systems. EDA algorithms, however, generally depend on irregular data structures such as sparse matrix and graphs, which pose major challenges for efficient GPU implementations. In this paper, we propose high-performance GPU implementations for a set of important irregular EDA computing patterns including sparse matrix, graph algorithms and message-passing algorithms. In the sparse matrix domain, we solve a core problem, sparse-matrix vector product (SMVP). On a wide range of EDA problem instances, our SMVP implementation outperforms all prior work and achieves a speedup up to 50× over the CPU baseline implementation. The GPU based SMVP procedure is applied to successfully accelerate two core EDA computing engines, timing analysis and linear system solution. In the graph algorithm domain, we developed a SMVP based formulation to efficiently solve the breadth-first search (BFS) problem on GPUs. We also developed efficient solutions for two message-passing algorithms, survey propagation (SP) based SAT solution and a register-transfer level (RTL) simulation. Our results prove that GPUs have a strong potential to accelerate EDA computing through designing GPU-friendly algorithms and/or re-organizing computing structures of sequential algorithms.

Read full abstract

The use of modern, high-performance graphical processing units (GPUs) for acceleration of scientific computation has been widely reported. The majority of this work has used the CUDA programming model supported exclusively by GPUs manufactured by NVIDIA. An industry standardisation effort has recently produced the OpenCL specification for GPU programming. This offers the benefits of hardware-independence and reduced dependence on proprietary tool-chains. Here we describe a source-to-source translation tool, “Swan” for facilitating the conversion of an existing CUDA code to use the OpenCL model, as a means to aid programmers experienced with CUDA in evaluating OpenCL and alternative hardware. While the performance of equivalent OpenCL and CUDA code on fixed hardware should be comparable, we find that a real-world CUDA application ported to OpenCL exhibits an overall 50% increase in runtime, a reduction in performance attributable to the immaturity of contemporary compilers. The ported application is shown to have platform independence, running on both NVIDIA and AMD GPUs without modification. We conclude that OpenCL is a viable platform for developing portable GPU applications but that the more mature CUDA tools continue to provide best performance. Program summary Program title: Swan Catalogue identifier: AEIH_v1_0 Program summary URL: http://cpc.cs.qub.ac.uk/summaries/AEIH_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: GNU Public License version 2 No. of lines in distributed program, including test data, etc.: 17 736 No. of bytes in distributed program, including test data, etc.: 131 177 Distribution format: tar.gz Programming language: C Computer: PC Operating system: Linux RAM: 256 Mbytes Classification: 6.5 External routines: NVIDIA CUDA, OpenCL Nature of problem: Graphical Processing Units (GPUs) from NVIDIA are preferentially programed with the proprietary CUDA programming toolkit. An alternative programming model promoted as an industry standard, OpenCL, provides similar capabilities to CUDA and is also supported on non-NVIDIA hardware (including multicore ×86 CPUs, AMD GPUs and IBM Cell processors). The adaptation of a program from CUDA to OpenCL is relatively straightforward but laborious. The Swan tool facilitates this conversion. Solution method: Swan performs a translation of CUDA kernel source code into an OpenCL equivalent. It also generates the C source code for entry point functions, simplifying kernel invocation from the host program. A concise host-side API abstracts the CUDA and OpenCL APIs. A program adapted to use Swan has no dependency on the CUDA compiler for the host-side program. The converted program may be built for either CUDA or OpenCL, with the selection made at compile time. Restrictions: No support for CUDA C++ features Running time: Nominal

Read full abstract

High-performance Graphics Processing Units Research Articles

Related Topics

Articles published on High-performance Graphics Processing Units

A GPU implementation of inclusion-based points-to analysis

A Web-Lab Environment for the Study of the Job Shop Problem

Towards a hybrid NMF-based neural approach for face recognition on GPUs

Towards accelerating irregular EDA applications with GPUs

Swan: A tool for porting CUDA programs to OpenCL

NBSymple, a double parallel, symplectic N-body code running on graphic processing units

Code Generation: A Strategy for Neural Network Simulators

Use of High-performance Graphics Processing Units for Power System Demand Forecasting

Exploiting graphics processing units for computational biology and bioinformatics

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

High-performance Graphics Processing Units Research Articles

Related Topics

Articles published on High-performance Graphics Processing Units

A GPU implementation of inclusion-based points-to analysis

A Web-Lab Environment for the Study of the Job Shop Problem

Towards a hybrid NMF-based neural approach for face recognition on GPUs

Towards accelerating irregular EDA applications with GPUs

Swan: A tool for porting CUDA programs to OpenCL

NBSymple, a double parallel, symplectic N-body code running on graphic processing units

Code Generation: A Strategy for Neural Network Simulators

Use of High-performance Graphics Processing Units for Power System Demand Forecasting

Exploiting graphics processing units for computational biology and bioinformatics