Abstract
Parallel computers now start to adopt Bandwidth-Asymmetric Memory architecture that consists of traditional DRAM memory and new High Bandwidth Memory (HBM) for high memory bandwidth. However, existing task schedulers suffer from low bandwidth usage and poor data locality problems in bandwidth-asymmetric memory architectures. To solve the two problems, we propose a Bandwidth and Locality Aware Task-stealing (BATS) system, which consists of an HBM-aware data allocator, a bandwidth-aware traffic balancer, and a hierarchical task-stealing scheduler. Leveraging compile-time code transformation and run-time data distribution, the data allocator enables HBM usage automatically without user interference. According to data access hotness, the traffic balancer migrates data to balance memory traffic across memory nodes proportional to their bandwidth. The hierarchical scheduler improves data locality at runtime without a priori program knowledge. Experiments on an Intel Knights Landing server that adopts bandwidth-asymmetric memory show that BATS reduces the execution time of memory-bound programs up to 83.5% compared with traditional task-stealing schedulers.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: ACM Transactions on Architecture and Code Optimization
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.