Page Table Compaction for TLB Coalescing

Jae Young Hur,Joonho Kong

doi:10.1109/access.2020.2999926

Jae Young Hur, Joonho Kong

Open Access

https://doi.org/10.1109/access.2020.2999926

Copy DOI

Abstract

In the traditional page-based memory management scheme, frequent page-table walks degrade performance and memory bandwidth utilization. A translation lookaside buffer (TLB) coalescing scheme reduces the problems by efficiently utilizing TLB and exploiting the contiguity in physical memory. In modern system hardware, it is usual that a memory transaction concurrently accesses multiple data. However, state-of-the-art TLB coalescing schemes do not fully utilize the data-level parallelism inherent in hardware. As a result, performance and memory bandwidth utilization can be degraded because of certain page-table walk overheads. To alleviate the overheads, we propose to conduct the compaction of allocated memory blocks (CAMB) in a page table. The proposed scheme can significantly reduce page-table walks by utilizing the data-level parallelism in hardware and the block-level allocation in operating system. A design, an analysis, a case study, an implementation, and an evaluation are presented. Considering image processing workloads as an example, experiments are conducted. The results indicate the presented scheme can improve performance and memory bandwidth utilization with modest cost.

Highlights

Modern system-on-a-chip (SoC) typically employs memory management units (MMUs) to enhance memory utilization and provide easy programming interfaces to programmers
The acquired page-table entry is stored in translation lookaside buffer (TLB) to reduce the latency of a page-table walk
A cycle-accurate transaction-level performance model of an (IO)MMU is implemented in C++, which is integrated in the simulation environment of [1]

Summary

Introduction

Modern system-on-a-chip (SoC) typically employs memory management units (MMUs) to enhance memory utilization and provide easy programming interfaces to programmers. The main role of MMUs is to provide an isolation between the virtual and the physical address space. Invoking MMUs incurs performance overheads because of page-table walks. To translate a virtual address, an MMU accesses memory to acquire a page-table entry, which is referred to as page-table walk (PTW). The acquired page-table entry is stored in translation lookaside buffer (TLB) to reduce the latency of a page-table walk. A. MEMORY ALLOCATION In Fig. 1, the traditional MMU architecture is depicted. OS allocates memory space which is divided into pages. In Fig. 1(a1), the application requires eight pages. OS allocates virtual page numbers (VPNs) 8-15 to the application. OS finds free contiguous pages in physical memory and allocates them in block level. In Fig. 1(a1), four physical blocks (Block0-Block3) are allocated.

Methods

Results

Conclusion