FPGAPRO: A Defense Framework Against Crosstalk-Induced Secret Leakage in FPGA

  • Abstract
  • Literature Map
  • Similar Papers
Abstract
Translate article icon Translate Article Star icon

With the emerging cloud-computing development, FPGAs are being integrated with cloud servers for higher performance. Recently, it has been explored to enable multiple users to share the hardware resources of a remote FPGA, i.e., to execute their own applications simultaneously. Although being a promising technique, multi-tenant FPGA unfortunately brings its unique security concerns. It has been demonstrated that the capacitive crosstalk between FPGA long-wires can be a side-channel to extract secret information, giving adversaries the opportunity to implement crosstalk-based side-channel attacks. Moreover, recent work reveals that medium-wires and multiplexers in configurable logic block (CLB) are also vulnerable to crosstalk-based information leakage. In this work, we propose FPGAPRO: a defense framework leveraging P lacement, R outing, and O bfuscation to mitigate the secret leakage on FPGA components, including long-wires, medium-wires, and logic elements in CLB. As a user-friendly defense strategy, FPGAPRO focuses on protecting the security-sensitive instances meanwhile considering critical path delay for performance maintenance. As the proof-of-concept, the experimental result demonstrates that FPGAPRO can effectively reduce the crosstalk-caused side-channel leakage by 138 times. Besides, the performance analysis shows that this strategy prevents the maximum frequency from timing violation.

Similar Papers
  • Conference Article
  • Cite Count Icon 18
  • 10.1145/3400302.3415695
Information leakage from FPGA routing and logic elements
  • Nov 2, 2020
  • Ilias Giechaskiel + 1 more

Information leakage in FPGAs poses a danger whenever multiple users share the reconfigurable fabric, for example in multi-tenant Cloud FPGAs, or whenever a potentially malicious IP module is synthesized within a single user's design on an FPGA. In such scenarios, capacitive crosstalk between so-called long routing wires has been previously shown to be a security vulnerability in both Xilinx and Intel FPGAs. Specifically, both static and dynamic values on long wires have been demonstrated to affect the delays of the adjacent long wires, and such delay changes have been exploited to steal sensitive information such as bits of cryptographic keys. While long-wire leakage is now well-understood and can be defended against, this work presents two other, new types of information leaks that pose similar risks, but which have not been studied in the past, and for which existing defenses do not work. First, this paper shows that other types of routing resources (namely medium wires) are also vulnerable to crosstalk, with changes in their delays also measurable fully on-chip. Second, this work introduces a novel source of information leaks that originates from logic elements within the FPGA Configurable Logic Blocks (CLBs) and is likely not the result of the capacitive crosstalk effects investigated in prior work. To understand the potential impact of the two new leakage sources, this paper experimentally characterizes and compares them in four families of Xilinx FPGAs, and discusses potential countermeasures in the context of existing attacks and defenses.

  • Conference Article
  • Cite Count Icon 24
  • 10.5555/1950815.1950894
Area-efficient FPGA logic elements: architecture and synthesis
  • Jan 25, 2011
  • Jason Anderson + 1 more

We consider architecture and synthesis techniques for FPGA logic elements (function generators) and show that the LUT-based logic elements in modern commercial FPGAs are over-engineered. Circuits mapped into traditional LUT-based logic elements have speeds that can be achieved by alternative logic elements that consume considerably less silicon area. We introduce the concept of a trimming input to a logic function, which is an input to a K-variable function about which Shannon decomposition produces a cofactor having fewer than K -- 1 variables. We show that trimming inputs occur frequently in circuits and we propose low-cost asymmetric FPGA logic element architectures that leverage the trimming input concept, as well as some other properties of a circuit's and-inverter graph (AIG) functional representation. We describe synthesis techniques for the proposed architectures that combine a standard cut-based FPGA technology mapping algorithm with two straightforward procedures: 1) Shannon decomposition, and 2) finding non-inverting paths in the circuit's AIG. The proposed architectures exhibit improved logic density versus traditional LUT-based architectures with minimal impact on circuit speed.

  • Conference Article
  • Cite Count Icon 32
  • 10.1109/aspdac.2011.5722215
Area-efficient FPGA logic elements: Architecture and synthesis
  • Jan 1, 2011
  • Jason H Anderson + 1 more

We consider architecture and synthesis techniques for FPGA logic elements (function generators) and show that the LUT-based logic elements in modern commercial FPGAs are over-engineered. Circuits mapped into traditional LUT-based logic elements have speeds that can be achieved by alternative logic elements that consume considerably less silicon area. We introduce the concept of a trimming input to a logic function, which is an input to a K-variable function about which Shannon decomposition produces a cofactor having fewer than K -1 variables. We show that trimming inputs occur frequently in circuits and we propose low-cost asymmetric FPGA logic element architectures that leverage the trimming input concept, as well as some other properties of a circuit's AND-inverter graph (AIG) functional representation. We describe synthesis techniques for the proposed architectures that combine a standard cut-based FPGA technology mapping algorithm with two straightforward procedures: 1) Shannon decomposition, and 2) finding non-inverting paths in the circuit's AIG. The proposed architectures exhibit improved logic density versus traditional LUT-based architectures with minimal impact on circuit speed.

  • Conference Article
  • Cite Count Icon 31
  • 10.1109/icfpt47387.2019.00060
HILL: A Hardware Isolation Framework Against Information Leakage on Multi-Tenant FPGA Long-Wires
  • Dec 1, 2019
  • Yukui Luo + 1 more

FPGA has recently been deployed in the multi-tenant cloud to provide high-performance computing capabilities. Such deployment of FPGA creates a new attack surface for adversary. It has been recently demonstrated that the capacitive crosstalk between FPGA long-wires can be used as a side-channel to extract secret information. In this paper, we present HILL: a Hardware Isolation framework against information Leakage on multi-tenant FPGA Long-wires. As a defense framework, HILL can prioritize the placement and routing of security-critical hardware instances and isolate them from other parts and tenants. For data and communication interfaces that use FPGA long-wires, such as UART, PCIe, and AXI4, HILL employs a long-wire obfuscation technique to reduce the side-channel leakage. We evaluate the performance of HILL with Xilinx Artix-7 FPGAs using two prevalent FPGA development tools: Xilinx ISE 14.7 and Vivado 2018.3. The experimental results demonstrate that HILL can effectively reduce the crosstalk-caused side-channel leakage by 138 times. The long-wire obfuscation technique reduces the correlation between the side-channel leakage and secret key from 81.7% to 50.3%, which is close to random guess.

  • Conference Article
  • Cite Count Icon 5
  • 10.1109/nssmic.2014.7431140
8-channel 14-Bit 125MHz FADC electronics with 1G Ethernet readout based on ZYNQ for HPGe Detector
  • Nov 1, 2014
  • Tao Xue + 3 more

Traditional High Purity Germanium Detectors need about 14-bit Flash Analog to Digital Convertors to digitalize. For the CJPL in China, a 10Kg HPGe package detectors is developed for dark matter research, every HPGe module is based on 3Kg HPGe detector, totally 40 channels FADC is needed. The 8 channels 125MHz 14bits FADC electronics is designed for analog signal digitalization. The ZYNQ is a new architecture from Xilinx, which include dual ARM Cortex-A9 800MHz 32-Bit processor and some advanced FPGA logic element. In the dual Cortex-A9 processor, the embedded Linux is running with stable and traditional TCP/IP stack, it's more easy and ready to use, program by c code, than the other TCP/IP stack based on FPGA logic, and can be easy updated. With the 1000Mbps Ethernet interface, it gives more than 500Mbps Ethernet data throughput without any program tricks. The FPGA logic element can interface the FADC, LTM9011-14 from Linear Tech, with the high speed LVDS interface. Data from FADC can be buffered to the high bandwidth DDR3 SDRAM through the high speed DMA interface with a little CPU intervene. The code in embed Linux based on C program and logic based on VDHL are all combined in one chip with small footprint and lower power consume. Furthermore, we introduce the system's bench mark with more than 500Mbps Ethernet throughput and the ENOB of the FADC.

  • Research Article
  • Cite Count Icon 14
  • 10.1016/j.infrared.2017.07.007
An improved non-uniformity correction algorithm and its hardware implementation on FPGA
  • Jul 8, 2017
  • Infrared Physics & Technology
  • Shenghui Rong + 5 more

An improved non-uniformity correction algorithm and its hardware implementation on FPGA

  • Conference Article
  • Cite Count Icon 1
  • 10.1109/eiconrus49466.2020.9039052
FPGAs Logic Checking Method by Genetic Algorithms
  • Jan 1, 2020
  • Ekaterina Y Danilova

Programmable logic integrated circuits are widely used in digital equipment, including equipment for critical applications. Therefore, the diagnosis of logic FPGA is an urgent task. Since the number of logic elements in modern FPGAs totals millions and the number of variables of individual logic elements totals eight the task of diagnosis is complicated. Existing methods for solving this problem often do not get the needed results in an acceptable time. The use of heuristic, including genetic, algorithms for diagnosing logic elements requires refinement to reflect these new factors in order to improve the built-in testing of FPGAs. GA for diagnosis of logic elements of FPGAs can significantly accelerate the diagnosis without losing its quality. Article investigates methods and tools for diagnosing the FPGA logic for highly reliable applications. The task is to perfect the methods for diagnosing the logic of the FPGA using new logic elements, based on genetic algorithms.

  • Conference Article
  • Cite Count Icon 3
  • 10.1145/1216919.1216922
Design of a logic element for implementing an asynchronous FPGA
  • Feb 18, 2007
  • Scott C Smith

A reconfigurable logic element (LE) is developed for use in constructing a NULL Convention Logic (NCL) FPGA. It can be configured as any of the 27 fundamental NCL gates, including resettable and inverting variations, and can utilize embedded registration for gates with three or fewer inputs. The developed LE is compared with a previous NCL LE, showing that the one developed herein yields a more area efficient NCL circuit implementation. The NCL FPGA logic element is simulated at the transistor level using the 1.8V, 180nm TSMC CMOS process.

  • Research Article
  • Cite Count Icon 14
  • 10.1145/1462586.1462592
Compute Bound and I/O Bound Cellular Automata Simulations on FPGA Logic
  • Jan 1, 2009
  • ACM Transactions on Reconfigurable Technology and Systems
  • S Murtaza + 2 more

FPGA-based computation engines have been used as Cellular Automata accelerators in the scientific community for some time now. With the recent availability of more advanced FPGA logic it becomes necessary to better understand the mapping of Cellular Automata to these systems. There are many trade-offs to consider when mapping a Cellular Automata algorithm from an abstract system to the physical implementation using FPGA logic. The trade-offs include both the available FPGA resources and the Cellular Automata algorithm's execution time. The most important aspect is to fully understand the behavior of the specified CA algorithm in terms of its execution times which are either compute bound or I/O bound. In this article, we present a methodology to categorize a specified CA algorithm as a compute bound or an I/O bound. We take the methodology further by presenting rigorous analysis for each of the two cases identifying the various parameters that control the mapping process and are defined both by the Cellular Automata algorithm and the given FPGA hardware specifications. This methodology helps to predict the performance of running Cellular Automata algorithms on specific FPGA hardware and to determine optimal values for the various parameters that control the mapping process. The model is validated for both compute and I/O bound two-dimensional Cellular Automata algorithms. We find that our model predictions are accurate within 7%.

  • Research Article
  • 10.4028/www.scientific.net/amm.130-134.37
An Improved High-Accuracy CORDIC Algorithm for DSC in Endoscopic Ultrasonography System
  • Oct 1, 2011
  • Applied Mechanics and Materials
  • Yan Li + 5 more

An Improved High-Accuracy CORDIC (COordinate Rotation Digital Computer) algorithm for digital scan conversion is presented in this paper to enhance the accuracy and speed of coordinate conversion for Endoscopic Ultrasonography. Several optimization methods are carried out to make coordinate conversion implemented more exactly with fewer resources of FPGA. In the paper, the Cartesian coordinates are re-demarcated to save LE (Logic Element) resources of FPGA. The bit width of data, the scale factor correction and the convergence range are all optimized to improve the accuracy of the algorithm. Further more, a special processing for the near-field data is carried out to reduce the errors of digital scan conversion. With a full pipeline structure implemented on FPGA, the Improved High-Accuracy CORDIC algorithm is validated by both simulation and real-time ultrasound imaging experiment, making the accuracy enhanced and the image quality improved.

  • Book Chapter
  • Cite Count Icon 25
  • 10.1007/978-3-642-28365-9_5
Table-Based Division by Small Integer Constants
  • Nov 17, 2011
  • Florent De Dinechin + 1 more

Computing cores to be implemented on FPGAs may involve divisions by small integer constants in fixed or floating point. This article presents a family of architectures addressing this need. They are derived from a simple recurrence whose body can be implemented very efficiently as a look-up table that matches the hardware resources of the target FPGA. For instance, division of a 32-bit integer by the constant 3 may be implemented by a combinatorial circuit of 48 LUT6 on a Virtex-5. Other options are studied, including iterative implementations, and architectures based on embedded memory blocks. This technique also computes the remainder. An efficient implementation of the correctly rounded division of a floating-point constant by such a small integer is also presented.

  • Conference Article
  • 10.1109/iecon.2006.347265
Architecture of a Real-Time Wavelet Transform Calculation SoPC Core for Industrial Applications
  • Nov 1, 2006
  • Jagoba Arias + 4 more

The wavelet transform has found application in a large range of fields, from image processing to communications. This paper describes the design and implementation of a core for the calculation of the wavelet transform of an incoming signal. This core may be connected to other similar cores to perform higher level wavelet transform. Due to its flexibility, the circuit described in this paper may be inserted in a SoPC, thus building a digital system, which is able to calculate this rather complex operation in real time, using the hardware resources of a low cost FPGA, leaving the results available for the rest of cores present in the SoPC. The data processing results achieved with this core implementation allow the calculation of the wavelet transform even for high speed signals and the possibility of including this core into a SoPC enables higher level operations, such as data compression, pulse detections, etc

  • Conference Article
  • Cite Count Icon 18
  • 10.1145/2967413.2967430
A Holistic Approach for Optimizing DSP Block Utilization of a CNN implementation on FPGA
  • Sep 12, 2016
  • Kamel Abdelouahab + 5 more

Deep Neural Networks are becoming the de-facto standard models for image understanding, and more generally for computer vision tasks. As they involve highly parallelizable computations, CNN are well suited to current fine grain programmable logic devices. Thus, multiple CNN accelerators have been successfully implemented on FPGAs. Unfortunately, FPGA resources such as logic elements or DSP units remain limited. This work presents a holistic method relying on approximate computing and design space exploration to optimize the DSP block utilization of a CNN implementation on an FPGA. This method was tested when implementing a reconfigurable OCR convolutional neural network on an Altera Stratix V device and varying both data representation and CNN topology in order to find the best combination in terms of DSP block utilization and classification accuracy. This exploration generated dataflow architectures of 76 CNN topologies with 5 different fixed point representation. Most efficient implementation performs 883 classifications/sec at 256 x 256 resolution using 8% of the available DSP blocks.

  • Conference Article
  • 10.1109/mace.2011.5988010
The control system and algorithm of brushless DC motor based on DSP
  • Jul 1, 2011
  • Dazhai Li + 2 more

According to the characteristic of brushless DC motor (BLDCM), taking full advantage of high-speed computation ability of DSP and hardware resources of FPGA, a system of BLDCM is designed based on DSP. In this paper, hardware of this system is introduced firstly, and the effect of each part is presented. Subsequently, design of self -adaptive fuzzy PID controller, as well as superiority of the corresponding algorithm are discussed.

  • Research Article
  • Cite Count Icon 5
  • 10.1109/access.2022.3230066
FPGA Acceleration of a Composite Kernel SVM for Hyperspectral Image Classification
  • Jan 1, 2023
  • IEEE Access
  • Kento Tajiri + 1 more

Hyperspectral image classification is one of the most important techniques for analyzing hyperspectral image that have hundreds of spectrum luminance values of near-infrared to visible light. For this classification, supervised learning methods are widely used, but in general, they typically trade off between their accuracy and computational complexity. Our approach is based on a composite kernel method, and the computation is simplified to achieve higher processing speeds by efficiently using the hardware resources of FPGA. The accuracy of this approach reaches 98.0&#x0025; and 98.8&#x0025; on two benchmark datasets, Indian Pines and Salinas via simulation, which is comparable to those in previous works. Two implementations, one with less hardware resources but more off-chip memory bandwidth, and another with more hardware resources but less off-chip memory bandwidth, are implemented on an FPGA and evaluated. The processing speeds of the two implementations are the same, which is 1.3 Mpixels/<inline-formula> <tex-math notation="LaTeX">$s$ </tex-math></inline-formula> for 2048 pixel wide images. This processing speed is fast enough for real-time processing, and faster than previous studies when normalized by hardware size and power consumption. We also introduce two more implementations that aim to reduce the on-chip memory usage of the second implementation within a reasonable increase of off-chip memory bandwidth, and we discuss which implementation is advantageous under what conditions.

Save Icon
Up Arrow
Open/Close