Datapath Architecture Research Articles

Efficient implementations of software masked designs constitute both an important goal and a significant challenge to Side Channel Analysis attack (SCA) security. In this paper we discuss the shortfall between generic C implementations and optimized (inline-) assembly versions while providing a large spectrum of efficient and generic masked implementations for any order, and demonstrate cryptographic algorithms and masking gadgets with reference to the state of the art. Our main goal is to show the prime performance gaps we can expect between different implementations and suggest how to harness the underlying hardware efficiently, a daunting task for various masking-orders or masking algorithm (multiplications, refreshing etc.). This paper focuses on implementations targeting wide vector bitsliced designs, such as the ISAP algorithm. We explore concrete instances of implementations utilizing processors enabled by wide-vector capability extensions of the AMD64 Instruction Set Architecture (ISA); namely, the SSE2/3/4.1, AVX-2 and AVX-512 Streaming Single Instruction Multiple Data extensions. These extensions mainly enable efficient memory level parallelism and provide a gradual reduction in computation-time as a function of the level of extension and the hardware support for instruction-level parallelism. For the first time we provide a complete open-source repository of such gadgets tailored for these extensions, various gadgets types and for all orders. We evaluate the disparities between generic high-level language masking implementations for optimized (inline-) assembly and conventional single execution path data-path architectures such as the ARM architecture. We underscore the crucial trade-off between state storage in the data-memory as compared to keeping it in the register-file (RF). This relates specifically to masked designs, and is particularly difficult to resolve because it requires inline-assembly manipulations and is not natively supported by compilers. Moreover, as the masking order (d) increases and the state gets larger, there must be an increase in data memory read/write accesses for state handling since the RF is simply not large enough. This requires careful optimization which depends to a considerable extent on the underlying algorithm to implement. We discuss how full utilization of SSE extensions is not always possible; i.e. when d is not a power of two, and pin-point the optimal d values and very sub-optimal values of d which aggressively under-utilize the hardware. More generally, this paper presents several different fully generic masked implementations for any order or multiple highly optimized (inline-) assembly instances which are quite generic (for a wide spectrum of ISAs and extensions), and provide very specific implementations targeting specific extensions. The goal is to promote open-source availability, research, improvement and implementations relating to SCA security and masked designs. The building blocks and methodologies provided here are portable and can be easily adapted to other algorithms.

Read full abstract

This paper proposes ReAdapt–a reconfigurable datapath architecture for scaling the energy-quality trade-off of adaptive filtering at runtime. The ReAdapt can dynamically select four adaptive filtering algorithms for gradating complexity levels during runtime by reconfiguring the processing flow in its datapath and by blocking the switching activity (e.g., reducing the CMOS dynamic power) of unused modules with data-gating. The ReAdapt proposal can scale the energy-quality trade-off by choosing the following four different levels of filter algorithms complexity: 1) least mean square (LMS); 2) partial update normalized LMS (PU-NLMS); 3) set-membership normalized LMS (SM-NLMS); 4) normalized LMS (NLMS). The ReAdapt architecture reuses common modules of each adaptive filter, resulting in a compact VLSI hardware implementation. The ReAdapt architecture operation is implemented in a case-study for interference mitigation for electroencephalogram (EEG) signal processing. The hardware synthesis results show an increase of 6.80 times in throughput and at least a reduction of 2.84 times in energy per operation compared with the state-of-the-art adaptive filters. This paper also investigates the benefits of dynamically reconfiguring the four ReAdapt operating modes at runtime for different levels of signal-to-noise ratio (SNR) for the processed signals. We also demonstrate that dynamically reconfiguring the ReAdapt operating modes during runtime results in an optimal energy-quality trade-off which is advantageous over the conventional single static mode.

Read full abstract

Datapath Architecture Research Articles

Related Topics

Articles published on Datapath Architecture

HLS‐based swarm intelligence driven optimized hardware IP core for linear regression‐based machine learning

MaskSIMD-lib: on the performance gap of a generic C optimized assembly and wide vector extensions for masked software with an Ascon-p test case

PSO based exploration of multi-phase encryption based secured image processing filter hardware IP core datapath during high level synthesis

ReAdapt: A Reconfigurable Datapath for Runtime Energy-Quality Scalable Adaptive Filters

CUTIE: Beyond PetaOp/s/W Ternary DNN Inference Acceleration With Better-Than-Binary Energy Efficiency

A new ASIC implementation of an advanced encryption standard (AES) crypto-hardware accelerator

Real-time implementation of fast discriminative scale space tracking algorithm

Design and analysis of high performance and low power FFT for DSP datapath using Vedic Multipliers

Implementation of Two-Dimensional (2D) Discrete Cosine Transform (DCT) using Reversible Gates

A 16-GB 640-GB/s HBM2E DRAM With a Data-Bus Window Extension Technique and a Synergetic On-Die ECC Scheme

SliceNetVSwitch: Definition, Design and Implementation of 5G Multi-Tenant Network Slicing in Software Data Paths

Energy Efficient Low Latency Multi-issue Cores for Intelligent Always-On IoT Applications

A 1.5 mW Programmable Acoustic Signal Processor for Hearing Assistive Devices With Speech Intelligibility Enhancement

Design and implementation of various datapath architectures for the ANU lightweight cipher on an FPGA

Efficient hardware implementations of QTL cipher for RFID applications

Impact of hardware steganography on DSP core datapath

Enhanced SPIHT Algorithm with Pipelined Datapath Architecture Design

Energy Efficient and Side-Channel Secure Cryptographic Hardware for IoT-Edge Nodes

Optimal Scheduling for Exposed Datapath Architectures with Buffered Processing Units by ASP

Variable Length Instruction Compression on Transport Triggered Architectures

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Datapath Architecture Research Articles

Related Topics

Articles published on Datapath Architecture

HLS‐based swarm intelligence driven optimized hardware IP core for linear regression‐based machine learning

MaskSIMD-lib: on the performance gap of a generic C optimized assembly and wide vector extensions for masked software with an Ascon-p test case

PSO based exploration of multi-phase encryption based secured image processing filter hardware IP core datapath during high level synthesis

ReAdapt: A Reconfigurable Datapath for Runtime Energy-Quality Scalable Adaptive Filters

CUTIE: Beyond PetaOp/s/W Ternary DNN Inference Acceleration With Better-Than-Binary Energy Efficiency

A new ASIC implementation of an advanced encryption standard (AES) crypto-hardware accelerator

Real-time implementation of fast discriminative scale space tracking algorithm

Design and analysis of high performance and low power FFT for DSP datapath using Vedic Multipliers

Implementation of Two-Dimensional (2D) Discrete Cosine Transform (DCT) using Reversible Gates

A 16-GB 640-GB/s HBM2E DRAM With a Data-Bus Window Extension Technique and a Synergetic On-Die ECC Scheme

SliceNetVSwitch: Definition, Design and Implementation of 5G Multi-Tenant Network Slicing in Software Data Paths

Energy Efficient Low Latency Multi-issue Cores for Intelligent Always-On IoT Applications

A 1.5 mW Programmable Acoustic Signal Processor for Hearing Assistive Devices With Speech Intelligibility Enhancement

Design and implementation of various datapath architectures for the ANU lightweight cipher on an FPGA

Efficient hardware implementations of QTL cipher for RFID applications

Impact of hardware steganography on DSP core datapath

Enhanced SPIHT Algorithm with Pipelined Datapath Architecture Design

Energy Efficient and Side-Channel Secure Cryptographic Hardware for IoT-Edge Nodes

Optimal Scheduling for Exposed Datapath Architectures with Buffered Processing Units by ASP

Variable Length Instruction Compression on Transport Triggered Architectures