Abstract
In this paper, the authors present several self-developed implementation variants of the Discrete Wavelet Transform (DWT) computation algorithms and compare their execution times against the commonly approved ones for representative modern Graphics Processing Units (GPUs) architectures. The proposed solutions avoid the time-consuming modulo divisions and conditional instructions used for DWT filters wrapping by proper expansion of the DWTs input data vectors. The main goal of the research is to improve the computation times for popular DWT algorithms for representative modern GPU architectures while retaining the code’s clarity and simplicity. The relations between algorithms execution time improvements for GPUs are also compared with their counterparts for traditional sequential processors. The experimental study shows that the proposed implementations, in the case of parallel realization on GPUs, are characterized by very simple kernel code and high execution time performance.
Highlights
The digital signal processing (DSP) has become an integral part of everyday life
We present several optimization variants of commonly used Discrete Wavelet Transform (DWT) computation algorithms, namely the matrix and the lattice structure-based approaches, and compare their execution time effectiveness for both CPU and Graphics Processing Units (GPUs) implementations
The results indicate that, despite of twofold reduction in computational complexity of the lattice structure-based approach in comparison with the matrix-based method, the former algorithm performs significantly worse for large transform sizes due to its more complex computational structure when implemented on GPU
Summary
The increasing number of electronic devices has led to a situation where almost everyone has to deal with digitally processed data. Current processors are often optimized to an extreme physical operational conditions, e.g., the widths of the electric paths are regularly close to the atomic size. This causes the need to look for new techniques to increase the computational efficiency. Increasing the frequency meets the physical barriers Those are only few of the reasons why parallel computing is becoming more and more popular [3, 4]. The conversion of traditional, sequential computation algorithms to their parallel counterparts requires suitable implementations and poses a real challenge for software engineers involved in algorithm optimization [5]
Published Version (
Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have