Abstract
Memory resource is a critical bottleneck for large-scale Deep Neural Network (DNN) applications. Hybrid Memory System (HMS) provides a promising solution to increase memory capacity in an affordable way. However, to release the powerful performance of HMS, data migration plays an important role. A typical DNN application has a couple of execution layers, and each requires distinct data objects. Deploying DNN on HMS imposes enormous challenges on data migration strategy and inspires us to pursue smart solutions. To tackle the data migration problem on HMS for DNN applications, we propose a runtime system for HMS that automatically optimizes scalable data migration and exploits domain knowledge on DNN to decide data migrations between the fast and slow memories in HMS. To achieve a better performance in data migrations for DNN training, we introduce a reference distance and location based data management strategy (ReDL) that treats short-lived and long-lived data objects with Idle and Dynamic migration methods, respectively. Using ReDL, DNN training on HMS with a smaller fast memory size can achieve similar performance to the fast memory-only system. The experimental results demonstrate that with configured the size of fast memory to be 20% of each workload’s peak memory consumption, our work achieves a similar performance (at most 9.6% performance difference) to the fast memory-only system. It further achieves an average of 19% and 11% improvement in data locality against the state-of-the-art solutions.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have