Abstract

The paper concerns parallel computations with communication based on Remote Direct Memory Access (RDMA), which provides for low level un-buffered access to distributed memory of computational nodes. Fine grain computation involves very frequent transmissions of small messages. For their efficient execution with RDMA communication a special memory infrastructure—rotating buffers (RB)—is proposed. Their organization is adjusted to program needs in advance—before program execution. It allows intensive use of all communication resources available in the system based on additional synchronization between involved processes. The proposed method is illustrated by an example of a typical fine-grain problem, which is the discrete Fast Fourier Transform (FFT). “The Transpose Algorithm” of FFT has been implemented with the RDMA rotating buffers and its efficiency is compared with a solution based on standard message passing library MPI.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.