Abstract
One of the many interesting architectural features of the CRAY-2 supercomputer is that each processor has access to 16K 64-bit words of local memory. This is in addition to the extremely large, 268-million-word common memory that is accessible by all four processors. By using local memory judiciously, it is possible to achieve increased performance on the CRAY-2. This is partly because accesses to local memory can be done simultaneously with accesses to common memory and other operations. Also, it is slightly faster to start up a vector access to local memory, and a processor does not have to compete with other processors when accessing its local memory. In this paper, we present an algorithm for computing the fast Fourier transform that takes advantage of the CRAY-2's local memory. It operates by solving subproblems, which are themselves Fourier transforms, entirely within local memory. By doing so it achieves a performance increase of between 25 and 40 percent over an equivalent algorithm that uses only common memory, and for some input sizes is able to outperform the CRAY-2 library FFT.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.