Demand Paging Research Articles

Contemporary discrete GPUs support rich memory management features such as virtual memory and demand paging. These features simplify GPU programming by providing a virtual address space abstraction similar to CPUs and eliminating manual memory management, but they introduce high performance overheads during (1) address translation and (2) page faults. A GPU relies on high degrees of thread-level parallelism (TLP) to hide memory latency. Address translation can undermine TLP, as a single miss in the translation lookaside buffer (TLB) invokes an expensive serialized page table walk that often stalls multiple threads. Demand paging can also undermine TLP, as multiple threads often stall while they wait for an expensive data transfer over the system I/O (e.g., PCIe) bus when the GPU demands a page. In modern GPUs, we face a trade-off on how the page size used for memory management affects address translation and demand paging. The address translation overhead is lower when we employ a larger page size (e.g., 2MB large pages, compared with conventional 4KB base pages), which increases TLB coverage and thus reduces TLB misses. Conversely, the demand paging overhead is lower when we employ a smaller page size, which decreases the system I/O bus transfer latency. Support for multiple page sizes can help relax the page size trade-off so that address translation and demand paging optimizations work together synergistically. However, existing page coalescing (i.e., merging base pages into a large page) and splintering (i.e., splitting a large page into base pages) policies require costly base page migrations that undermine the benefits multiple page sizes provide. In this paper, we observe that GPGPU applications present an opportunity to support multiple page sizes without costly data migration, as the applications perform most of their memory allocation en masse (i.e., they allocate a large number of base pages at once).We show that this en masse allocation allows us to create intelligent memory allocation policies which ensure that base pages that are contiguous in virtual memory are allocated to contiguous physical memory pages. As a result, coalescing and splintering operations no longer need to migrate base pages.

Read full abstract

The Department of Computer Science and Technology, Peking University, Beijing, China, has shown that a novel Dynamic Memory Mapping (DMM) model brings about additional flexibility to virtual resource management, leading to the feature-adjustable design of a virtual machine monitor (VMM). The study is reported in Issue 53 (June, 2010) of SCIENCE CHINA Information Sciences because of its significant research value. Memory is one of the most frequently accessed components in virtual machine (VM) systems. Because a VM’s memory requirement varies according to the running applications, disregarding the dynamic changes can result in suboptimal use of memory resources, which negatively affects the VM’s performance. However, because the infrastructures of current technologies are usually independent of each other, they exhibit poor extensibility, integrity, and maintainability. To improve the flexibility and extensibility of the VMM, we need to implement a dynamic memory management mechanism in the VMM, while preserving the high efficiency of memory accesses from virtual machines. To resolve these problems, this work proposes a DMM model [1]. The DMM model is a low-level memory management mechanism, which allows dynamic change of the mapping between the pseudo-physical memory as seen from VMs and the machine memory, while the virtual machine is running. On one hand, DMM is independent of, yet compatible with, various virtualization architectures, while on the other, it presents a uniform upward interface for supporting high-level memory management policies. As a result, the DMM layer incorporates high-level policies and low-level implementations by making both of them adjustable. In this work, Prof. Wang, Prof. Luo and their group present the principle of the DMM model, and explain the procedures of various memory management policies under this model, such as demand paging, virtual memory and memory sharing. They also implement the DMM model in KVM, an open source VMM. They first designed a memory pool, a set of machine pages provided by the VMM to a particular virtual machine whose size could be expanded or shrunk while operating. To make the model work in a real system, they manipulated a page-level protection mechanism to propagate memory-mapping updates to the shadow page tables, which is the only way for a VM to access its virtualized memory. They also utilized reverse mapping, a data structure that maps a machine page back to all the shadow page table entries that have mapped it, to facilitate the mapping propagation. The DMM model has several advantages over the current memory management mechanism in VMMs. The first is platform independency—the model is defined abstractly, thus is independent of implementations and computer architectures. The second is flexibility—the DMM model provides a uniform interface for integrating advanced memory management policies. Through the general mechanism, they can work together without conflict. Last but not least, the modular and layered design of the DMM reduces the complexity of a VMM’s code base, and therefore improves the security and dependability of the system. A journal reviewer noted:“This paper addresses the inefficiency in the design of current virtual machine monitors. Their approach is novel and systematic, and incurs only minor overheads. The result is of academic significance and practical value”. Another reviewer said, “It enriches and expands the capacity and capability of virtualization. It offers us new methods to deploy and manage large numbers of virtual machines”. A series of papers about virtual machine system optimization written by Prof. Wang, Prof. Luo and their group have been published in SCIENCE CHINA Information Sciences [2], IEEE Cluster [3], ACM SIGOPS Operating System Review [4]. The authors are affiliated to the Institute of Network Computing and Information Systems (NCIS, http://ncis.pku.edu.cn) at Peking University. This institute, led by Prof. LI XiaoMing, conducts research mainly in the fields of high productivity computing, search engine and Web mining, distributed systems, internet and mobile computing, and database technology.

Read full abstract

Demand Paging Research Articles

Related Topics

Articles published on Demand Paging

Fine-grain Quantitative Analysis of Demand Paging in Unified Virtual Memory

BARM: A Batch-Aware Resource Manager for Boosting Multiple Neural Networks Inference on GPUs With Memory Oversubscription

Exploring Synchronous Page Fault Handling

A quantitative evaluation of unified memory in GPUs

HPE: Hierarchical Page Eviction Policy for Unified Memory in GPUs

Enlarging I/O Size for Faster Loading of Mobile Applications

Mosaic

Efficient Proximity Search with Query Logs

GPUswap

Demand Paging Techniques for Flash Memory Using Compiler Post-Pass Optimizations

Dynamic memory mapping delivers additional flexibility to virtual resource management

Non-preemptive demand paging technique for NAND flash-based real-time embedded systems

Scratchpad Memory Management Techniques for Code in Embedded Systems without an MMU

Swap space management technique for portable consumer electronics with NAND flash memory

DMM: A dynamic memory mapping model for virtual machines

A new method of fast compression of program code for ota updates in consumer devices

Energy and Performance Optimization of Demand Paging With OneNAND Flash

A Universal Online Caching Algorithm Based on Pattern Matching

Replacement and swapping strategy to improve read performance of portable consumer devices using compressed file systems

SWL

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Demand Paging Research Articles

Related Topics

Articles published on Demand Paging

Fine-grain Quantitative Analysis of Demand Paging in Unified Virtual Memory

BARM: A Batch-Aware Resource Manager for Boosting Multiple Neural Networks Inference on GPUs With Memory Oversubscription

Exploring Synchronous Page Fault Handling

A quantitative evaluation of unified memory in GPUs

HPE: Hierarchical Page Eviction Policy for Unified Memory in GPUs

Enlarging I/O Size for Faster Loading of Mobile Applications

Mosaic

Efficient Proximity Search with Query Logs

GPUswap

Demand Paging Techniques for Flash Memory Using Compiler Post-Pass Optimizations

Dynamic memory mapping delivers additional flexibility to virtual resource management

Non-preemptive demand paging technique for NAND flash-based real-time embedded systems

Scratchpad Memory Management Techniques for Code in Embedded Systems without an MMU

Swap space management technique for portable consumer electronics with NAND flash memory

DMM: A dynamic memory mapping model for virtual machines

A new method of fast compression of program code for ota updates in consumer devices

Energy and Performance Optimization of Demand Paging With OneNAND Flash

A Universal Online Caching Algorithm Based on Pattern Matching

Replacement and swapping strategy to improve read performance of portable consumer devices using compressed file systems

SWL