Abstract

Efficiently processing vast and expanding data volumes is a pressing challenge. Traditional high-performance computers, utilizing distributed-memory architecture and a message-passing model, grapple with synchronization issues, hampering their ability to keep up with the growing demands. Remote Memory Access (RMA), often referred to as one-sided MPI communications, offers a solution by allowing a process to directly access another process’s memory, eliminating the need for message exchange and significantly boosting performance. Unfortunately, the existing MPI RMA standard lacks a collective operation interface, limiting efficiency. To overcome this constraint, we introduce an algorithm design that enables efficient parallelizable collective operations within the RMA framework. Our study focuses primarily on the advantages of collective operations, using the broadcast algorithm as a case study. Our implementations surpass traditional methods, highlighting the promising potential of this technique, as indicated by initial performance tests.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.