AVX-512 Instruction Research Articles

The QPC-TDSE program serves as a general tool to study laser-driven dynamics of electrons in ideal isolated atoms and molecules by solving the full-dimensional non-relativistic time-dependent Schrödinger equation (TDSE) within single-active-electron approximation. It expands the full-dimensional electronic wavefunction in spherical coordinates by spherical harmonics and B-spline functions and employs a set of parallel Crank-Nicolson propagators combined with split-operator techniques to evolve the wavefunction in time, which support centrifugal and multi-polar static potentials to treat atomic and molecular scenarios and accepts arbitrary combinations of linearly or elliptically polarized lasers within the dipole approximation. The program is capable of extracting the photo-electron momentum distribution via t-SURFF approach or projection onto either the exact scattering states or the planewaves. Its applications in different scenarios are given as examples, e.g., above threshold ionization, attosecond clock, higher-order harmonic generation. Program summaryProgram Title: QPC-TDSECPC Library link to program files:https://doi.org/10.17632/xjm3kfgv75.1Licensing provisions: GPLv3Programming language: C++External libraries: HDF5, GSL, MKLNature of problem: Numerical solution of TDSE and extraction of various types of electron spectrum.Solution method: The electronic wavefunction is expanded by B-spline functions and spherical harmonics whose range is chosen elaborately to reduce the total number of partial waves for non-linearly polarized lasers. The Crank-Nicolson approach combined with an operator-splitting scheme is used to propagate the wavefunction in time, either in velocity gauge or length gauge. Matrix inversions are solved via either dense or sparse linear algebra solvers according to their structures. The t-SURFF method and projections onto either the scattering states or planewaves are provided for the accurate extraction of the momentum distributions.Additional comments including restrictions and unusual features: Only lasers within dipole approximation are supported. For the multi-polar potentials, only pure Coulombic ones are supported. Routines for solving exact scattering states have only been implemented for centrifugal potentials. The codes are written in C++17 and can only be compiled on the platforms that support the avx instruction sets. An extension for the propagation algorithm using the avx-512 intrinsics is provided as optional.

Read full abstract

The article is devoted to the issues of increasing the security and efficiency of software implementation for the symmetric block ciphers. For the implementation of cryptoalgorithms on low-end CPUs (8/16/32-bit microcontrollers), it is important to provide increased resistance to power consumption analysis attacks. With regard to the implementation of ciphers on high-end CPUs (x86, ARM Cortex-A), it is important to eliminate the vulnerability primarily to timing and cache attacks. The authors used a bitslice approach to securely implement block ciphers, which has potential advantages such as high speed and low computing resources. However, the known bitsliced methods have a significant limitation, since they work with deterministic S-Boxes or arbitrary S-Boxes of smaller sizes. The paper proposes a new heuristic method for bitsliced representation of cryptographic 8×8 S-Boxes containing randomly generated values. These values defy description using algebraic expressions. The method is based on the decomposition of the truth table, which describes the S-Box, into two parts. One part of the table forms logical masks, and the other is split into bit vectors. To find a logical description of these vectors an exhaustive search is used. After finding the description of all vectors, these two parts of the table are combined into one using logical operations. The use of this method oriented on software implementation in the logical basis {AND, OR, XOR, NOT} ensures the minimization of arbitrary 8×8 S-Boxes. The proposed method can be implemented using standard logical instructions on any 8/16/32/64-bit processors. It is also possible to use logical SIMD instructions from the SSE, AVX, AVX-512 extensions for x86-64 processors, which provides high performance due to the use of long registers. The corresponding software has been developed that implements the method of searching for bitsliced representations of a given S-Box, and also automatically generates C++ code for it based on SSE, AVX and AVX-512 instructions. The effectiveness of the method on the S-Box of known block ciphers, in particular the Ukrainian encryption standard "Kalyna", has been investigated. It was found that the developed algorithm requires almost half as many gates for the bitsliced description of an arbitrary S-Box than the best of known algorithm (370 gates versus 680, respectively). For ciphers that use two or four S-Box tables, joint minimization can yield up to 330 or 300 gates per table, respectively. Keywords: bitslicing; S-Box; logical minimization; SIMD; x86-64 CPU; software implementation; block ciphers.

Read full abstract

AVX-512 Instruction Research Articles

Related Topics

Articles published on AVX-512 Instruction

Truncated multiplication and batch software SIMD AVX512 implementation for faster Montgomery multiplications and modular exponentiation

Optimizing Dilithium Implementation with AVX2/-512

HAETAE: Shorter Lattice-Based Fiat-Shamir Signatures

Parallel Implementation of Lightweight Secure Hash Algorithm on CPU and GPU Environments

Improving Performance of Massive Text Real-Time Classification for Document Confidentiality Management

Gem5-AVX: Extension of the Gem5 Simulator to Support AVX Instruction Sets

Vectorization of CMSSW offline software

AVX-TSCHA: Leaking information through AVX extensions in commercial processors

WHFast512: A symplectic N-body integrator for planetary systems optimized with AVX512 instructions

QPC-TDSE: A parallel TDSE solver for atoms and small molecules in strong lasers

Fast Polarization-Adjusted Convolutional (PAC) Software Decoders: Algorithm and Implementation

Acceleration of Particle Swarm Optimization with AVX Instructions

AVX512Crypto: Parallel Implementations of Korean Block Ciphers Using AVX-512

Energy Efficiency of a New Parallel PIC Code for Numerical Simulation of Plasma Dynamics in Open Trap

VecDualSPHysics: A vectorized implementation of Smoothed Particle Hydrodynamics method for simulating fluid flows on multi-core processors

HSMA: An O(N) electrostatics package implemented in LAMMPS

Fast Implementation of Multiplication on Polynomial Rings

Faster multiplication over $${\mathbb {F}}_2[X]$$ using AVX512 instruction set and VPCLMULQDQ instruction

Software optimization for fast encoding and decoding of Reed-Solomon codes

Евристичний метод для bitsliced подання випадково згенерованих 88 криптографічних S-Box

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

AVX-512 Instruction Research Articles

Related Topics

Articles published on AVX-512 Instruction

Truncated multiplication and batch software SIMD AVX512 implementation for faster Montgomery multiplications and modular exponentiation

Optimizing Dilithium Implementation with AVX2/-512

HAETAE: Shorter Lattice-Based Fiat-Shamir Signatures

Parallel Implementation of Lightweight Secure Hash Algorithm on CPU and GPU Environments

Improving Performance of Massive Text Real-Time Classification for Document Confidentiality Management

Gem5-AVX: Extension of the Gem5 Simulator to Support AVX Instruction Sets

Vectorization of CMSSW offline software

AVX-TSCHA: Leaking information through AVX extensions in commercial processors

WHFast512: A symplectic N-body integrator for planetary systems optimized with AVX512 instructions

QPC-TDSE: A parallel TDSE solver for atoms and small molecules in strong lasers

Fast Polarization-Adjusted Convolutional (PAC) Software Decoders: Algorithm and Implementation

Acceleration of Particle Swarm Optimization with AVX Instructions

AVX512Crypto: Parallel Implementations of Korean Block Ciphers Using AVX-512

Energy Efficiency of a New Parallel PIC Code for Numerical Simulation of Plasma Dynamics in Open Trap

VecDualSPHysics: A vectorized implementation of Smoothed Particle Hydrodynamics method for simulating fluid flows on multi-core processors

HSMA: An O(N) electrostatics package implemented in LAMMPS

Fast Implementation of Multiplication on Polynomial Rings

Faster multiplication over $${\mathbb {F}}_2[X]$$ using AVX512 instruction set and VPCLMULQDQ instruction

Software optimization for fast encoding and decoding of Reed-Solomon codes

Евристичний метод для bitsliced подання випадково згенерованих 88 криптографічних S-Box