Abstract

Heterogeneous memory (HM, also known as hybrid memory) has become popular in emerging parallel architectures due to its programming flexibility and energy efficiency. Unlike the traditional memory subsystem, HM consists of fast and slow components. Usually, the fast memory lacks hardware support, which puts extra burdens on programmers and compilers for explicit data placement. Thus, HM provides both opportunities and challenges with programming parallel codes. It is important to understand how to utilize HM and set expectations on the benefits of HM. Prior work principally uses simulators to study HM, which lacks the analysis on a real hardware. To address this issue, this paper experiments with a real system—the TI KeyStone II—to study HM. We make three contributions. First, we develop a set of parallel benchmarks to characterize the performance and power efficiency of HM. It is the first benchmark suite with OpenMP 4.0 features that is functional on real HM architectures. Second, we build a profiling tool to provide guidance for placing data in HM. Our tool analyzes memory access patterns and provides high-level feedback at the source-code level for optimization. Third, we apply the data placement optimization to our benchmarks and evaluate the effectiveness of HM in boosting performance and saving energy.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.