Abstract

We have developed an efficient implementation to compute the 2D fast Fourier transform (FFT) on a new very long instruction word programmable mediaprocessor. Using instruction-level parallelism and a multimedia instruction set, our radix-4 Cooley–Tukey algorithm optimally maps the FFT computation to the processing resources of the Hitachi/Equator’s MAP mediaprocessor. We have also achieved more efficient data I/O and lower data transfer time compared to traditional implementations by processing several columns in parallel during the column-wise stage of row–column decomposition. We used a programmable direct memory access engine and a double-buffering scheme in the data cache to perform the computation and the data transfer in parallel. Our implementation resulted in 22.4 ms total execution time for a 512 × 512-point 2D complex FFT, which is faster than previous single-chip programmable or dedicated solutions. The implementations on two other mediaprocessors, the TriMedia TM1100 and the BOPS ManArray, illustrate the importance of the instruction set architecture for achieving high performance and the trend of data I/O becoming the limitation on the 2D FFT performance in newer mediaprocessors.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.