Accelerate Literature Icon
Want to do a literature review? Try our new Literature Review workflow

A Lightweight Delay-based Authentication Scheme for DMA Attack Mitigation

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon

With the extensive application of the Direct Memory Access (DMA) technique, the efficiency of data transfer between the peripheral and the host machine has been improved dramatically. However, these optimizations also introduce security vulnerabilities and expose the process of data transmission to DMA attacks that utilize the feature of direct access to steal the data stored in the live memory on the victim system. In this paper, we propose a lightweight scheme to provide resilience to DMA attacks without physical and protocol-level modification. The proposed scheme constructs a unique identifier for each DMA-supported PCIe device based on profiling time and builds a trusted database for authentication. The experimental result shows that the proposed methodology eliminates most of the noise produced in the measuring process for identifier construction and the success rate of authentication is 100% for all the devices.

Similar Papers
  • Research Article
  • Cite Count Icon 2
  • 10.1145/3730582
DCMA: Accelerating Parallel DMA Transfers with a Multi-Port Direct Cached Memory Access in a Massive-Parallel Vector Processor
  • Jun 30, 2025
  • ACM Transactions on Architecture and Code Optimization
  • Gia Bao Thieu + 2 more

State-of-the-art applications, such as convolutional neural networks, demand specialized hardware accelerators that address performance and efficiency constraints. An efficient memory hierarchy is mandatory for such hardware systems. While the memory architectures of general-purpose processors (e.g., CPU or GPUs) are based on cache systems, dedicated accelerators have mostly adopted the DMA (Direct Memory Access) concept due to the application field of image processing. DMA features like 2D data transfers or data padding can optimize the memory accesses of image processing. However, DMA lacks the capability to exploit temporal and spatial data reuse, a feature common in cache systems, particularly when multiple DMAs operate in parallel. This article proposes a novel Direct Cached Memory Access (DCMA) architecture, combining both DMA and cache methodologies and their respective advantages. Optimized for image-based AI algorithms, the DCMA architecture facilitates enhanced memory access by integrating multiple, parallel DMA ports with caching capabilities. This design allows for efficient data reuse and parallel memory access. Optimal parameters for the DCMA are determined through a comprehensive design space exploration. The DCMA is evaluated on a state-of-the-art Xilinx UltraScale+ FPGA board coupled with a massive-parallel vertical vector co-processor, called V 2 PRO. The results show the mitigation of the vector processor’s memory bottleneck. By using the proposed DCMA, speedups of up to ×17 for the ResNet-50 CNN can be achieved.

  • Book Chapter
  • 10.1016/b978-075065796-9/50004-5
4 - The PC for real time work
  • Jan 1, 2003
  • Practical Data Acquisition for Instrumentation and Control Systems
  • John Park + 1 more

4 - The PC for real time work

  • Research Article
  • Cite Count Icon 1
  • 10.1504/ijhpcn.2017.10005140
PvFPGA: paravirtualising an FPGA-based hardware accelerator towards general purpose computing
  • Jan 1, 2017
  • International Journal of High Performance Computing and Networking
  • Miodrag Bolic + 2 more

This paper presents an ameliorated design of pvFPGA, which is a novel system design solution for virtualising an FPGA-based hardware accelerator by a virtual machine monitor (VMM). The accelerator design on the FPGA can be used for accelerating various applications, regardless of the application computation latencies. In the implementation, we adopt the Xen VMM to build a paravirtualised environment, and a Xilinx Virtex-6 as an FPGA accelerator. The data transferred between the x86 server and the FPGA accelerator through direct memory access (DMA), and a streaming pipeline technique is adopted to improve the efficiency of data transfer. Several solutions to solve streaming pipeline hazards are discussed in this paper. In addition, we propose a technique, hyper-requesting, which enables portions of two requests bidding to different accelerator applications to be processed on the FPGA accelerator simultaneously through DMA context switches, to achieve request level parallelism. The experimental results show that hyper-requesting reduces request turnaround time by up to 80%.

  • Research Article
  • Cite Count Icon 2
  • 10.1504/ijhpcn.2017.084246
PvFPGA: paravirtualising an FPGA-based hardware accelerator towards general purpose computing
  • Jan 1, 2017
  • International Journal of High Performance Computing and Networking
  • Wei Wang + 2 more

This paper presents an ameliorated design of pvFPGA, which is a novel system design solution for virtualising an FPGA-based hardware accelerator by a virtual machine monitor (VMM). The accelerator design on the FPGA can be used for accelerating various applications, regardless of the application computation latencies. In the implementation, we adopt the Xen VMM to build a paravirtualised environment, and a Xilinx Virtex-6 as an FPGA accelerator. The data transferred between the x86 server and the FPGA accelerator through direct memory access (DMA), and a streaming pipeline technique is adopted to improve the efficiency of data transfer. Several solutions to solve streaming pipeline hazards are discussed in this paper. In addition, we propose a technique, hyper-requesting, which enables portions of two requests bidding to different accelerator applications to be processed on the FPGA accelerator simultaneously through DMA context switches, to achieve request level parallelism. The experimental results show that hyper-requesting reduces request turnaround time by up to 80%.

  • Research Article
  • 10.1016/j.micpro.2003.10.001
Template-based automatic data flow code generation for mediaprocessors
  • Nov 21, 2003
  • Microprocessors and Microsystems
  • Michael S Grow + 2 more

Template-based automatic data flow code generation for mediaprocessors

  • Conference Article
  • Cite Count Icon 8
  • 10.1109/iscas.2000.858794
Direct memory access frequency synthesizer for channel efficiency improvement in frequency hopping communication
  • May 28, 2000
  • C.M Yuen + 2 more

A frequency synthesizer using the direct memory access (DMA) technique is designed for frequency hopping spread spectrum (FH-SS) communication systems. The frequency synthesizer provides fast channel acquisition by using simple memory table look-up technique. The technique simplify the frequency control process and reduces the channel switching time. As a result, the channel efficiency can be improved.

  • Conference Article
  • Cite Count Icon 49
  • 10.1109/sp40001.2021.00018
DICE: Automatic Emulation of DMA Input Channels for Dynamic Firmware Analysis
  • May 1, 2021
  • Alejandro Mera + 3 more

Microcontroller-based embedded devices are at the core of Internet-of-Things (IoT) and Cyber-Physical Systems (CPS). The security of these devices is of paramount importance. Among the approaches to securing embedded devices, dynamic firmware analysis (e.g., vulnerability detection) gained great attention lately, thanks to its offline nature and low false-positive rates. However, regardless of the analysis and emulation techniques used, existing dynamic firmware analyzers share a major limitation, namely the inability to handle firmware using DMA (Direct Memory Access). It severely limits the types of devices supported and firmware code coverage.We present DICE, a drop-in solution for firmware analyzers to emulate DMA input channels and generate or manipulate DMA inputs (from peripherals to firmware). DICE is designed to be hardware-independent (i.e., no actual peripherals or DMA controllers needed) and compatible with common MCU firmware (i.e., no firmware-specific DMA usages assumed) and embedded architectures. The high-level idea behind DICE is the identification and emulation of the abstract DMA input channels, rather than the highly diverse peripherals and controllers. DICE identifies DMA input channels as the firmware writes the source and destination DMA transfer pointers into the DMA controller. Then DICE manipulates the input transferred through DMA on behalf of the firmware analyzer. DICE does not require firmware source code or additional features from firmware analyzers.We integrated DICE to the recently proposed firmware analyzer P <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sup> IM (for ARM Cortex-M architecture) and a PIC32 emulator (for MIPS M4K/M-Class architecture). We evaluated it on 83 benchmarks and sample firmware, representing 9 different DMA controllers from 5 different vendors. DICE detected 33 out of 37 DMA input channels, with 0 false positives. It correctly supplied DMA inputs to 21 out of 22 DMA buffers that firmware actually use, which previous firmware analyzers cannot achieve due to the lack of DMA emulation. DICE’s overhead is fairly low, it adds 3.4% on average to P <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sup> IM execution time. We also fuzz-tested 7 real-world firmware using DICE and compared the results with the original P <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">2</sup> IM. DICE uncovered tremendously more execution paths (as much as 79X) and found 5 unique previously-unknown bugs that are unreachable without DMA emulation. All our source code and dataset are publicly available.

  • Conference Article
  • Cite Count Icon 45
  • 10.1109/hpca.1997.569696
User-level DMA without operating system kernel modification
  • Feb 1, 1997
  • E.P Markatos + 1 more

Direct Memory Access (DMA) is frequently used to transfer data between the main memory of a host computer and the interconnection network, in order to free the host processor from the burden of the transfer. DMA operations are traditionally initiated by the operating system kernel, mainly to prevent one application from tampering with another applications' data. Recent architecture trends suggest that interconnection networks get faster, while operating systems get slower (compared to processor speeds). These trends imply that the initiation of a DMA operation becomes slower (due to operating system involvement), while the DMA data transfer itself becomes faster with time. Soon, the operating system overhead associated with starting a DMA will be larger than the data transfer itself, esp. for small data transfers. This paper proposes several algorithms that allow user-level applications to start DMA operating without the involvement of the operating system. Our algorithms allow user applications to have direct (but controlled) access to the DMA engine registers. Overhead user-level DMA is achieved without compromising protection, and without requiring changes to the underlying operating system kernel. Using our proposed algorithms, a DMA operation can be initiated in 2 to 5 assembly instructions. By comparison, operating system-based initiation of DMA requires thousands of assembly instructions.

  • Research Article
  • Cite Count Icon 3
  • 10.5075/epfl-thesis-4672
Architectural Support for Coherent Architecturally Visible Storage in Instruction Set Extensions
  • Jan 1, 2010
  • Infoscience (Ecole Polytechnique Fédérale de Lausanne)
  • Ties Jan Henderikus Kluter

When it comes to performance, embedded systems share many problems with their higher-end counterparts. The growing gap between top processor frequency and memory access speed, the memory wall, is one such problem. Driven, in part, by low energy consumption and low cost requirements, embedded systems are often customized to a single application, or a very small set of applications. In addition, time-to-market requirements and the increasing complexity of embedded systems drives the need for fully or partially automated design tools and also to the extensive use of caches and cache hierarchies. The introduction of multi-processor-based embedded platforms has accelerated this trend; as the design space for embedded systems has grown, designers have become unclear as to whether automatic processor customization tools can cope with this increased complexity. The recent introduction of new techniques addressing the automatic customization, such as Architecturally Visible Storage (AVS) memory-enhanced Instruction Set Extension (ISE) identification algorithms, has also created new challenges. AVS memories are distinct from the cache hierarchy and rely on Direct Memory Access (DMA) transfers to communicate with main memory. In an embedded system containing hardware-managed caches, these extra AVS memories, in combination with their corresponding DMA transfers, cause coherence and consistence problems. Although the problems of coherence and consistence are well known in multi-processor systems, conventional solutions may be expensive in terms of area and power consumption, rendering them unacceptable for use in embedded systems. This thesis presents two low cost coherence mechanisms that solve these two problems. The first mechanism addresses embedded systems that already contain a hardware coherence protocol, like many high-end embedded multi-processor systems. Traditionally, the DMA transactions are transparent to the hardware coherence protocol. By ensuring visibility of these DMA transactions to the hardware coherence protocol, coherence can be guaranteed between AVS memories and data cache(s). As a result, minor changes to the DMA engine are required. Moreover, by forcing the processor pipeline to stall if a DMA transfer is active, memory consistence can be guaranteed. This mechanism provides significant speedup when compared to the execution of a non-ISE-enhanced system; however, due to the increase in bus traffic, this speedup comes at the expense of an increase in energy consumption. Coherent and Speculative DMA are both implementations of this mechanism. Single-processor systems do not contain hardware coherence protocols, and would therefore benefit from a lower-cost solution to the coherence and consistence problems than a hardware coherence protocol. By tightly coupling the AVS memories to the hardware cache, coherence and consistence for the complete system can be guaranteed. This coupling requires insignificant changes to the hardware cache's hit detection circuitry and state machine without influencing its critical path, thus it is inherently inexpensive. This mechanism provides significant speedups and reduces the energy consumed if compared to the execution on non-ISE-enhanced systems. Furthermore, the tight coupling enables direct communication between the AVS memories and the data cache, making this mechanism independent from the processor-to-memory distance. Virtual Ways and Way Stealing are both implementations of this mechanism. Besides enforcing coherence and consistence, the ability to integrate the architectural changes into an automated design flow is important. This thesis shows the influence of Coherent DMA, Speculative DMA, Virtual Ways, and Way Stealing on the ISE-identification algorithm. It shows the architectural requirements and the cost for enforcing coherence and consistence that need to be taken into account when applying these mechanisms in an automated flow without formulating new algorithms.

  • Book Chapter
  • Cite Count Icon 1
  • 10.1007/978-981-10-7665-7_10
An Efficient Approach to Manage DMA Descriptors and Evaluate PCIe-Based DMA Performance for ALICE Common Readout Unit (CRU)
  • Jan 1, 2018
  • S Mukherjee + 6 more

This paper presents the status of performance evaluation of Peripheral Component Interconnect (PCIe)-based Direct Memory Access (DMA) engine for A Large Ion Collider Experiment-Common Readout Unit (ALICE-CRU) upgrade program using advanced Intel Arria 10 FPGA. The CRU will mainly read out most of the upgraded sub-detectors data and transport the same through the PCIe-DMA engine to server. DMA engine moves data using descriptor. DMA controller pushes those descriptors toward DMA engine. The main goal of this paper is to explain the way DMA engine is to be controlled by DMA controller such that max DMA performance can be achieved. The DMA performance has been evaluated on various server grade machines using Intel Arria 10 FPGA kit (https://www.altera.com/products/boards_and_kits/dev-kits/altera/kit-a10-gx-fpga.html, [1]). The result is around 95% of theoretical DMA engine bandwidth.

  • Research Article
  • Cite Count Icon 8
  • 10.1016/0165-0270(89)90122-2
A high-speed multichannel neural data acquisition system for IBM PC compatibles
  • Jan 1, 1989
  • Journal of Neuroscience Methods
  • James L Novak + 1 more

A high-speed multichannel neural data acquisition system for IBM PC compatibles

  • Research Article
  • Cite Count Icon 4
  • 10.1016/j.micpro.2021.104302
An efficient and scalable parallel mapping of pulse-Doppler radar signal processing chain on a multi-core DSP
  • Jun 22, 2021
  • Microprocessors and Microsystems
  • Abdessamad Klilou + 3 more

An efficient and scalable parallel mapping of pulse-Doppler radar signal processing chain on a multi-core DSP

  • Conference Article
  • Cite Count Icon 1
  • 10.1109/iwsoc.2004.1319887
Interfacing in microprocessor-based systems with an advanced physical addressing
  • Sep 28, 2004
  • M Maamoun + 3 more

An architecture for interfacing the data exchange between microprocessor-based systems and external devices is presented. This architecture investigates the great capacity of the interfacing of extended physical addressing and uses both the direct memory access (DMA) technique and memory integration. This method will contribute to improve the speed of data exchange.

  • Conference Article
  • 10.1109/mocast.2017.7937638
Heterogeneous computing system platform for high-performance pattern recognition applications
  • May 1, 2017
  • M Ali Mirzaei + 13 more

we present a system architecture made of a motherboard with a Xilinx Zynq System on Chip (SoC) and a mezzanine board equipped with an Associative Memory chip (AM). The proposed architecture is designed to serve as an accelerator of general purpose algorithms based on pipeline processing and pattern recognition. We present the open source software and firmware developed to fully exploit the available communication channels between the ARM CPU and the FPGA using Direct Memory Access (DMA) technique and the AM using Multi-Gigabit Transceivers (MGT). We report the measured performances and discuss potential applications and future developments. The proposed architecture is compact, portable and provide a large communication bandwidth between components.

  • Conference Article
  • Cite Count Icon 11
  • 10.1109/lcn.2013.6761335
PCV: Predicting contact volume for reliable and efficient data transfers in opportunistic networks
  • Oct 1, 2013
  • Shiraz Qayyum + 3 more

Exploiting opportunistic contacts between mobile devices to enable deployment of real applications through reliable and efficient data transfers poses a significant research challenge. Indeed, accurate prediction of contact volume, defined as the maximum amount of data transferable during a contact, can improve performance of deployments. However, existing schemes for estimating contact volume that make use of preconceived patterns or contact time distributions may not be applicable in uncertain environments. In this paper, we propose a novel scheme called PCV that predicts contact volume in soft real-time to enable efficient and reliable data transfers in opportunistic networks. An Android Application that learns data rate profiles has been developed to facilitate PCV. In addition, an analytical model has been developed to depict variable data rates between mobile devices. Extensive simulations are carried out on both synthetic and real world mobility traces to validate the usefulness of PCV. Experimental results show the effectiveness of our approach in terms of reliable data transfers.

Save Icon
Up Arrow
Open/Close
Notes

Save Important notes in documents

Highlight text to save as a note, or write notes directly

You can also access these Documents in Paperpal, our AI writing tool

Powered by our AI Writing Assistant