Generic Multiphase Software Pipelined Partial FFT on Instruction Level Parallel Architectures

Min Li Min Li,B Bougard,L Van Der Perre,F Catthoor,T Carlson,D Novo

doi:10.1109/tsp.2008.2010422

Abstract

<para xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> The partial fast Fourier transform (PFFT) is an extended fast Fourier transformation (FFT) where only part of the input or output bins are used. By pruning useless data flow, it is possible to achieve a significant speedup in many important applications. Although theoretical aspects of the PFFT have been thoroughly studied in the past three decades, efficient and generic implementations were rarely reported. The most important obstacle for the optimization of the PFFT is the highly irregular data flow and the associated control flow. In addition, a size-<formula formulatype="inline"><tex Notation="TeX">$N$</tex> </formula> PFFT has <formula formulatype="inline"><tex Notation="TeX">$2^{N}$</tex> </formula> possibilities of data flow patterns, so finding a flexible but efficient implementation is very challenging. Our contribution is a generic method to map the highly irregular data flow of an arbitrary PFFT onto instruction level parallel architectures using software pipelining. By leveraging the algorithmic level flexibilities in a FFT, we select an appropriate data flow variant that enables aggressive optimizations in implementation schemes. Then, we apply a divide and conquer strategy, partitioning the PFFT into three phases. For each phase, we introduce specialized control structures, loop structures, address generation schemes and memory operations. This reduces cycle count, number of executed instructions and memory accesses. By studying ten representative benchmarks from wireless baseband applications, we are able to produce repeatable and successful results on the TMS320C6000. When comparing to two optimized FFT implementations, our work reduces the cycle count by 20.5% to 87.5%, executed instructions by 11.2% to 86.5% and L1D and L1P cache accesses by 16.1% to 79.4% and 19.5% to 87.1% respectively. To the best of our knowledge, this is the first reported work about a generic software pipelined PFFT for instruction level parallel architectures. </para>

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Generic Multiphase Software Pipelined Partial FFT on Instruction Level Parallel Architectures

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Signal Processing

Lead the way for us

Journal: IEEE Transactions on Signal Processing	Publication Date: Apr 1, 2009
Citations: 32

Similar Papers

Hybrid Carrier Communication with Partial FFT Demodulation over Underwater Acoustic Channels
Yong Li ... Kun Wang
IEEE Communications Letters | VOL. 17
Yong Li, et. al.Yong Li ... Kun Wang
01 Dec 2013
IEEE Communications Letters | VOL. 17

MIMO-SC-FDE Communication with Partial FFT Demodulation over Underwater Acoustic Channels
Xiao Zhang ... Wei Ge
-
Xiao Zhang, et. al.Xiao Zhang ... Wei Ge
01 Dec 2019
01 Dec 2019

A one-to-two-dimensional mapping using a partial Fast Fourier Transform
Stellan Östlund
Physica A: Statistical Mechanics and its Applications | VOL. 389
Stellan ÖstlundStellan Östlund
17 Mar 2010
Physica A: Statistical Mechanics and its Applications | VOL. 389

Generic multi-phase software-pipelined Partial-FFT on instruction-level-parallel architectures and SDR baseband applications
Min Li ... Liesbet Van Der Perre
-
Min Li, et. al.Min Li ... Liesbet Van Der Perre
10 Mar 2008
10 Mar 2008

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Generic Multiphase Software Pipelined Partial FFT on Instruction Level Parallel Architectures

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Signal Processing