Software Overhead Research Articles

The abstraction of a shared memory space over separate CPU and GPU memory domains has eased the burden of portability for many HPC codebases. However, users pay for ease of use provided by system-managed memory with a moderate-to-high performance overhead. NVIDIA Unified Virtual Memory (UVM) is currently the primary real-world implementation of such abstraction and offers a functionally equivalent testbed for in-depth performance study for both UVM and future Linux Heterogeneous Memory Management (HMM) compatible systems. The continued advocacy for UVM and HMM motivates improvement of the underlying system. We focus on UVM-based systems and investigate the root causes of UVM overhead, a non-trivial task due to complex interactions of multiple hardware and software constituents and the desired cost granularity. In our prior work, we delved deeply into UVM system architecture and showed internal behaviors of page fault servicing in batches. We provided quantitative evaluation of batch handling for various applications under different scenarios, including prefetching and oversubscription. We revealed that the driver workload depends on the interactions among application access patterns, GPU hardware constraints, and host OS components. Host OS components have significant overhead present across implementations, warranting close attention. This extension furthers our prior study in three aspects: fine-grain cost analysis and breakdown, extension to multiple GPUs, and investigation of platforms with different GPU-GPU interconnects. We take a top-down approach to quantitative batch analysis and uncover how constituent component costs accumulate and overlap, governed by synchronous and asynchronous operations. Our multi-GPU analysis shows reduced cost of GPU-GPU batch workloads compared to CPU-GPU workloads. We further demonstrate that while specialized interconnects, NVLink, can improve batch cost, their benefits are limited by host OS software overhead and GPU oversubscription. This study serves as a proxy for future shared memory systems, such as those that interface with HMM, and the development of interconnects.

Read full abstract

Traveling wave ion mobility experiments using planar electrode structures (e.g., structures for lossless ion manipulation, TW-SLIM) leverage the mature manufacturing capabilities of printed circuit boards (PCBs). With routine levels of mechanical precision below 150 μm, the conceptual flexibility afforded by PCBs for use as planar ion guides is expansive. To date, the design and construction of TW-SLIM platforms require considerable legacy expertise, especially with respect to simulation and circuit layout strategies. To lower the barrier of TW-SLIM implementation, we introduce Python-based interactive tools that assist in graphical layout of the core electrode footprints for planar ion guides with minimal user inputs. These scripts also export the exact component locations and assignments for direct integration into KiCad and SIMION for PCB finalization and ion flight simulations. The design concepts embodied in the set of scripts comprising SLIM Pickins (PCB CAD generation) and pigsim (SIMION workspace generation) build upon the lessons learned in the independent development of the research-grade TW-SLIM platforms in operation at WSU. Due to the inherent flexibility of the PCB manufacturing process and the time devoted to board layouts prior to manufacturing, both scripts serve to enable rapid, iterative design considerations. Because only a few predefined parameters are necessary (i.e., the TW-SLIM monomer width, x position following a TW Turn, and y position following a TW Turn) it is possible to design the exact component layouts and accompanying simulation space in a manner of minutes. There is no known limitation to the board layout capacities of the scripts, and the size of a designed layout is ultimately constrained by the abilities of the final PCB design and simulation tools, KiCad and SIMION, to accommodate the thousands of electrodes comprising the final design (i.e., RAM and software overhead). Toward removing the barriers to exploring new SLIM tracks and the likelihood of layout errors that require considerable revision and engineering time, the SLIM Pickins and pigsim tools (included as Supporting Information) allow the user to quickly design a length of planar ion guide, simulate its abilities to confine and transmit ions, compare hypothetical board outlines to given vacuum chamber dimensions, and generate a near-production ready PCB CAD file. In addition to these tools, this report outlines a series of cost-saving strategies with respect to vacuum feedthroughs and vacuum chamber design for TW ion mobility experiments using planar ion guides.

Read full abstract

Software Overhead Research Articles

Related Topics

Articles published on Software Overhead

Research on Fault Attacks of Lightweight Cryptographic Algorithms

Fine-grain Quantitative Analysis of Demand Paging in Unified Virtual Memory

SLIM Tricks: Tools, Concepts, and Strategies for the Development of Planar Ion Guides.

FlexCNN: An End-to-end Framework for Composing CNN Accelerators on FPGA

Sample-Efficient Adaptive Calibration of Quantum Networks Using Bayesian Optimization

CFFS: A Persistent Memory File System for Contiguous File Allocation With Fine-Grained Metadata

$\mathit {O(N)}$ Memory-Free Hardware Architecture for Burrows-Wheeler Transform

QBLKe: Host-side flash translation layer management for Open-Channel SSDs

DiPOSH: A portable OpenSHMEM implementation for short API‐to‐network path

DiG: enabling out-of-band scalable high-resolution monitoring for data-center analytics, automation and control (extended)

DySHARQ: Dynamic Software-Defined Hardware-Managed Queues for Tile-Based Architectures

Crab-tree

An efficient hybrid digital architecture for space vector PWM method for multilevel VSI

Using Approximate Computing and Selective Hardening for the Reduction of Overheads in the Design of Radiation-Induced Fault-Tolerant Systems

Modeling and Optimization for Self-powered Non-volatile IoT Edge Devices with Ultra-low Harvesting Power

Exploring Fault-Tolerant Erasure Codes for Scalable All-Flash Array Clusters

A Secure Exception Mode for Fault-Attack-Resistant Processing

Efficient implementation of MPI-3 RMA over openFabrics interfaces

OFDM-OAM Modulation for Future Wireless Communications

Channel autocorrelation-based dynamic slot scheduling for body area networks

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Software Overhead Research Articles

Related Topics

Articles published on Software Overhead

Research on Fault Attacks of Lightweight Cryptographic Algorithms

Fine-grain Quantitative Analysis of Demand Paging in Unified Virtual Memory

SLIM Tricks: Tools, Concepts, and Strategies for the Development of Planar Ion Guides.

FlexCNN: An End-to-end Framework for Composing CNN Accelerators on FPGA

Sample-Efficient Adaptive Calibration of Quantum Networks Using Bayesian Optimization

CFFS: A Persistent Memory File System for Contiguous File Allocation With Fine-Grained Metadata

$\mathit {O(N)}$ Memory-Free Hardware Architecture for Burrows-Wheeler Transform

QBLKe: Host-side flash translation layer management for Open-Channel SSDs

DiPOSH: A portable OpenSHMEM implementation for short API‐to‐network path

DiG: enabling out-of-band scalable high-resolution monitoring for data-center analytics, automation and control (extended)

DySHARQ: Dynamic Software-Defined Hardware-Managed Queues for Tile-Based Architectures

Crab-tree

An efficient hybrid digital architecture for space vector PWM method for multilevel VSI

Using Approximate Computing and Selective Hardening for the Reduction of Overheads in the Design of Radiation-Induced Fault-Tolerant Systems

Modeling and Optimization for Self-powered Non-volatile IoT Edge Devices with Ultra-low Harvesting Power

Exploring Fault-Tolerant Erasure Codes for Scalable All-Flash Array Clusters

A Secure Exception Mode for Fault-Attack-Resistant Processing

Efficient implementation of MPI-3 RMA over openFabrics interfaces

OFDM-OAM Modulation for Future Wireless Communications

Channel autocorrelation-based dynamic slot scheduling for body area networks