An efficient implementation of one-dimensional discrete wavelet transform algorithms for GPU architectures

Kamil Stokfiszewski,Kamil Wieloch,Mykhaylo Yatsymirskyy

doi:10.1007/s11227-022-04331-8

Kamil Stokfiszewski, Kamil Wieloch + Show 1 more

Open Access

PDF Available

https://doi.org/10.1007/s11227-022-04331-8

Copy DOI

Export

Save

Cite

Journal: The Journal of Supercomputing	Publication Date: Feb 14, 2022
Citations: 8	License type: open-access

Affiliation: Lodz University of Technology

Abstract
Highlights/Summary
Full-Text PDF
Similar Papers

Abstract

Listen

In this paper, the authors present several self-developed implementation variants of the Discrete Wavelet Transform (DWT) computation algorithms and compare their execution times against the commonly approved ones for representative modern Graphics Processing Units (GPUs) architectures. The proposed solutions avoid the time-consuming modulo divisions and conditional instructions used for DWT filters wrapping by proper expansion of the DWTs input data vectors. The main goal of the research is to improve the computation times for popular DWT algorithms for representative modern GPU architectures while retaining the code’s clarity and simplicity. The relations between algorithms execution time improvements for GPUs are also compared with their counterparts for traditional sequential processors. The experimental study shows that the proposed implementations, in the case of parallel realization on GPUs, are characterized by very simple kernel code and high execution time performance.

Highlights

The digital signal processing (DSP) has become an integral part of everyday life
We present several optimization variants of commonly used Discrete Wavelet Transform (DWT) computation algorithms, namely the matrix and the lattice structure-based approaches, and compare their execution time effectiveness for both CPU and Graphics Processing Units (GPUs) implementations
The results indicate that, despite of twofold reduction in computational complexity of the lattice structure-based approach in comparison with the matrix-based method, the former algorithm performs significantly worse for large transform sizes due to its more complex computational structure when implemented on GPU

Summary

Introduction

The increasing number of electronic devices has led to a situation where almost everyone has to deal with digitally processed data. Current processors are often optimized to an extreme physical operational conditions, e.g., the widths of the electric paths are regularly close to the atomic size. This causes the need to look for new techniques to increase the computational efficiency. Increasing the frequency meets the physical barriers Those are only few of the reasons why parallel computing is becoming more and more popular [3, 4]. The conversion of traditional, sequential computation algorithms to their parallel counterparts requires suitable implementations and poses a real challenge for software engineers involved in algorithm optimization [5]

Objectives

Methods

Results

Conclusion