Memory Scheduling Research Articles

When multiple processor (CPU) cores and a GPU integrated together on the same chip share the off-chip main memory, requests from the GPU can heavily interfere with requests from the CPU cores, leading to low system performance and starvation of CPU cores. Unfortunately, state-of-the-art application-aware memory scheduling algorithms are ineffective at solving this problem at low complexity due to the large amount of GPU traffic. A large and costly request buffer is needed to provide these algorithms with enough visibility across the global request stream, requiring relatively complex hardware implementations. This paper proposes a fundamentally new approach that decouples the memory controller's three primary tasks into three significantly simpler structures that together improve system performance and fairness, especially in integrated CPU-GPU systems. Our three-stage memory controller first groups requests based on row-buffer locality. This grouping allows the second stage to focus only on inter-application request scheduling. These two stages enforce high-level policies regarding performance and fairness, and therefore the last stage consists of simple per-bank FIFO queues (no further command reordering within each bank) and straightforward logic that deals only with low-level DRAM commands and timing. We evaluate the design trade-offs involved in our Staged Memory Scheduler (SMS) and compare it against three state-of-the-art memory controller designs. Our evaluations show that SMS improves CPU performance without degrading GPU frame rate beyond a generally acceptable level, while being significantly less complex to implement than previous application-aware schedulers. Furthermore, SMS can be configured by the system software to prioritize the CPU or the GPU at varying levels to address different performance needs.

In a chip-multiprocessor (CMP) system, the DRAM system isshared among cores. In a shared DRAM system, requests from athread can not only delay requests from other threads by causingbank/bus/row-buffer conflicts but they can also destroy other threads’DRAM-bank-level parallelism. Requests whose latencies would otherwisehave been overlapped could effectively become serialized. As aresult both fairness and system throughput degrade, and some threadscan starve for long time periods.This paper proposes a fundamentally new approach to designinga shared DRAM controller that provides quality of service to threads,while also improving system throughput. Our parallelism-aware batchscheduler (PAR-BS) design is based on two key ideas. First, PARBSprocesses DRAM requests in batches to provide fairness and toavoid starvation of requests. Second, to optimize system throughput,PAR-BS employs a parallelism-aware DRAM scheduling policythat aims to process requests from a thread in parallel in the DRAMbanks, thereby reducing the memory-related stall-time experienced bythe thread. PAR-BS seamlessly incorporates support for system-levelthread priorities and can provide different service levels, includingpurely opportunistic service, to threads with different priorities.We evaluate the design trade-offs involved in PAR-BS and compareit to four previously proposed DRAM scheduler designs on 4-, 8-, and16-core systems. Our evaluations show that, averaged over 100 4-coreworkloads, PAR-BS improves fairness by 1.11X and system throughputby 8.3% compared to the best previous scheduling technique, Stall-Time Fair Memory (STFM) scheduling. Based on simple request prioritizationrules, PAR-BS is also simpler to implement than STFM.

Memory Scheduling Research Articles

Related Topics

Articles published on Memory Scheduling

Demand look-ahead memory access scheduling for 3D graphics processing units

Improving memory scheduling via processor-side load criticality information

MDC FFT/IFFT Processor With Variable Length for MIMO-OFDM Systems

Staged memory scheduling

GPU Accelerated Parallel Cholesky Factorization

The Application of Rapidly Produced Orthophoto to Aero Geophysical Survey

Prefetch-aware shared resource management for multi-core systems

A cost-effective heuristic to schedule local and remote memory in cluster computers

Thread Cluster Memory Scheduling

Memory performance and scopolamine: Hypoactivity of the thalamus revealed by cytochrome oxidase histochemistry

Iterative learning control of dynamic memory caching to enhance processing performance on java platform

Hippocampal heterogeneity in spatial memory revealed by cytochrome oxidase

Parallelism-Aware Batch Scheduling

Memory scheduling for modern microprocessors

Adaptive History-Based Memory Schedulers for Modern Processors

Partitioning and Scheduling DSP Applications with Maximal Memory Access Hiding

Evolutionary algorithms for the synthesis of embedded software

Efficient algorithm and architecture for post-processor in HDTV

Efficient algorithm and architecture for post-processor in HDTV

An efficient pipelined parallel architecture for blocking effect removal in HDTV

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Memory Scheduling Research Articles

Related Topics

Articles published on Memory Scheduling

Demand look-ahead memory access scheduling for 3D graphics processing units

Improving memory scheduling via processor-side load criticality information

MDC FFT/IFFT Processor With Variable Length for MIMO-OFDM Systems

Staged memory scheduling

GPU Accelerated Parallel Cholesky Factorization

The Application of Rapidly Produced Orthophoto to Aero Geophysical Survey

Prefetch-aware shared resource management for multi-core systems

A cost-effective heuristic to schedule local and remote memory in cluster computers

Thread Cluster Memory Scheduling

Memory performance and scopolamine: Hypoactivity of the thalamus revealed by cytochrome oxidase histochemistry

Iterative learning control of dynamic memory caching to enhance processing performance on java platform

Hippocampal heterogeneity in spatial memory revealed by cytochrome oxidase

Parallelism-Aware Batch Scheduling

Memory scheduling for modern microprocessors

Adaptive History-Based Memory Schedulers for Modern Processors

Partitioning and Scheduling DSP Applications with Maximal Memory Access Hiding

Evolutionary algorithms for the synthesis of embedded software

Efficient algorithm and architecture for post-processor in HDTV

Efficient algorithm and architecture for post-processor in HDTV

An efficient pipelined parallel architecture for blocking effect removal in HDTV