Abstract

Remote direct memory access (RDMA) and non-uniform memory access (NUMA) are critical technologies of modern high-performance computing platforms. RDMA allows nodes to directly access memory on remote machines. Multiprocessor architectures implement NUMA to scale up memory access performance. When paired together, these technologies exhibit performance penalties under certain configurations. This paper is the first study to explore these configurations to provide quantitative findings on the impact of NUMA for RDMA-based systems. One of the consequences of ultra-fast networks is that known implications of NUMA locality now constitute a higher relative impact on the performance of RDMA-enabled distributed systems. Our study quantifies its role and uncovers unexpected behavior. In summary, poor NUMA locality of remotely accessible memory can lead to an automatic 20% performance degradation. Additionally, local workloads operating on remotely accessible memory can lead to 300% performance gap depending on memory locality. Surprisingly, configurations demonstrating this result contradict the presumed impact of NUMA locality. Our findings are validated using two generations of RDMA cards, a synthetic benchmark, and the popular application Memcached ported for RDMA.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.