Efficient Partitioning of Algorithms for Long Convolutions and their Mapping onto Architectures

Laurens Bierens,Ed Deprettere

doi:10.1023/a:1007993310185

Abstract

We present an efficient approach for the partitioning of algorithms implementing long convolutions. The dependence graph (DG) of a convolution algorithm is locally sequential globally parallel (LSGP) partitioned into smaller, less complex convolution algorithms. The LSGP partitioned DG is mapped onto a signal flow graph (SFG), in which each processor element (PE) performs a small convolution algorithm. The key is then to reduce the complexity of the SFG in two steps: 1. local reduction of complexity: the short Fast Fourier Transform (FFT) is used to perform the small convolution within the PE; and 2. global reduction of complexity: the short FFTs within the PEs are relocated to the global level, where redundant short FFT operations are eliminated. The remaining operation within the PEs is now a simple element-wise multiply-add. After a graph transform, the structure of the SFG kernel is recognized as a set of parallel small convolutions. If we use the short FFT to perform these short convolutions, we come to our final realization of the long convolution algorithm. The computational complexity of this realization is close to the optimum for convolutions, that is, O(N log N). Our approach is thus achieving this N log N ---low without having to implement large-size FFTs. We use, instead, small FFT blocks. The advantage is that small FFT transforms are commercially available, and that they can even be implemented in single-chip VLSI architectures. Our final SFG is three dimensional and can be mapped efficiently onto prototype architectures or dedicated VLSI processors. We demonstrate the procedure in the paper by a design example: the implementation of a prototype convolution architecture that we designed for a real-time radar imaging system.

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Efficient Partitioning of Algorithms for Long Convolutions and their Mapping onto Architectures

Abstract

Talk to us

Similar Papers

More From: The Journal of VLSI Signal Processing Systems for Signal, Image, and Video Technology

Lead the way for us

Journal: The Journal of VLSI Signal Processing Systems for Signal, Image, and Video Technology	Publication Date: Jan 1, 1998
Citations: 14

Similar Papers

A Modified Signal Flow Graph and Corresponding Conflict-Free Strategy for Memory-Based FFT Processor Design
Yinghui Tian ... Qi Shen
IEEE Transactions on Circuits and Systems II: Analog and Digital Signal Processing | VOL. 66
Yinghui Tian, et. al.Yinghui Tian ... Qi Shen
01 Jan 2019
IEEE Transactions on Circuits and Systems II: Analog and Digital Signal Processing | VOL. 66

Design of run-time fault-tolerant arrays of self-checking processing elements
J Franzen
-
J FranzenJ Franzen
05 Sep 1990
05 Sep 1990

Systolic Array Synthesis using SFG Representations
C R Wan ... D J Evans
International Journal of Computer Mathematics | VOL. 79
C R Wan, et. al.C R Wan ... D J Evans
01 Jan 2002
International Journal of Computer Mathematics | VOL. 79

Implementation and evaluation of parallel FFT on Engineering and Scientific Computation Accelerator (ESCA) architecture
Dan Wu ... Jin-Li Rao
Journal of Zhejiang University SCIENCE C | VOL. 12
Dan Wu, et. al.Dan Wu ... Jin-Li Rao
01 Dec 2011
Journal of Zhejiang University SCIENCE C | VOL. 12

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Efficient Partitioning of Algorithms for Long Convolutions and their Mapping onto Architectures

Abstract

Talk to us

Similar Papers

More From: The Journal of VLSI Signal Processing Systems for Signal, Image, and Video Technology