Abstract
Processing-in-memory (PIM) has been widely explored in academia and industry to accelerate numerous workloads. By reducing the data movement and increasing parallelism, PIM offers great performance and energy efficiency. A large amount of cores or nodes present in PIM provide massive parallelism and compute throughput; however, this also proposes challenges and limitations for some workloads. In this work, we provide an extensive evaluation and analysis of a real PIM system from UPMEM. We specifically target emerging workloads featuring collective communication, demonstrating its role as the primary limitation within current PIM architecture.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.