Tensor Contraction Research Articles

Machine learning and artificial intelligence (AI) applications often rely on performing many small matrix operations—in particular general matrix–matrix multiplication (GEMM). These operations are usually performed in a reduced precision, such as the 16-bit floating-point format (i.e., half precision or FP16). The GEMM operation is also very important for dense linear algebra algorithms, and half-precision GEMM operations can be used in mixed-precision linear solvers. Therefore, high-performance batched GEMM operations in reduced precision are significantly important, not only for deep learning frameworks, but also for scientific applications that rely on batched linear algebra, such as tensor contractions and sparse direct solvers.This paper presents optimized batched GEMM kernels for graphics processing units (GPUs) in FP16 arithmetic. The paper addresses both real and complex half-precision computations on the GPU. The proposed design takes advantage of the Tensor Core technology that was recently introduced in CUDA-enabled GPUs. With eight tuning parameters introduced in the design, the developed kernels have a high degree of flexibility that overcomes the limitations imposed by the hardware and software (in the form of discrete configurations for the Tensor Core APIs). For real FP16 arithmetic, performance speedups are observed against cuBLAS for sizes up to 128, and range between 1.5× and 2.5×. For the complex FP16 GEMM kernel, the speedups are between 1.7× and 7× thanks to a design that uses the standard interleaved matrix layout, in contrast with the planar layout required by the vendor’s solution. The paper also discusses special optimizations for extremely small matrices, where even higher performance gains are achievable.

Classical simulation of quantum computation is necessary for studying the numerical behavior of quantum algorithms, as there does not yet exist a large viable quantum computer on which to perform numerical tests. Tensor network (TN) contraction is an algorithmic method that can efficiently simulate some quantum circuits, often greatly reducing the computational cost over methods that simulate the full Hilbert space. In this study we implement a tensor network contraction program for simulating quantum circuits using multi-core compute nodes. We show simulation results for the Max-Cut problem on 3- through 7-regular graphs using the quantum approximate optimization algorithm (QAOA), successfully simulating up to 100 qubits. We test two different methods for generating the ordering of tensor index contractions: one is based on the tree decomposition of the line graph, while the other generates ordering using a straight-forward stochastic scheme. Through studying instances of QAOA circuits, we show the expected result that as the treewidth of the quantum circuit’s line graph decreases, TN contraction becomes significantly more efficient than simulating the whole Hilbert space. The results in this work suggest that tensor contraction methods are superior only when simulating Max-Cut/QAOA with graphs of regularities approximately five and below. Insight into this point of equal computational cost helps one determine which simulation method will be more efficient for a given quantum circuit. The stochastic contraction method outperforms the line graph based method only when the time to calculate a reasonable tree decomposition is prohibitively expensive. Finally, we release our software package, qTorch (Quantum TensOR Contraction Handler), intended for general quantum circuit simulation. For a nontrivial subset of these quantum circuits, 50 to 100 qubits can easily be simulated on a single compute node.

Tensor Contraction Research Articles

Related Topics

Articles published on Tensor Contraction

Matrix multiplication on batches of small matrices in half and half-complex precisions

DecomVQANet: Decomposing visual question answering deep network via tensor decomposition and regression

An Efficient Galerkin Averaging-Incremental Harmonic Balance Method Based on the Fast Fourier Transform and Tensor Contraction

Методы оптимизации обобщенных тензорных сверток

Hashin–Shtrikman bounds of periodic linear elastic media with cubic symmetry

Fast Bilinear Algorithms for Symmetric Tensor Contractions

Magnetic dipole localization and magnetic moment estimation method based on normalized source strength

The multi-configurational time-dependent Hartree approach in optimized second quantization: Imaginary time propagation and particle number conservation

Numerical assessment for accuracy and GPU acceleration of TD-DMRG time evolution schemes.

The influence of coordinates in robotic manipulability analysis

Supersymmetric Landau-Ginzburg tensor models

Towards a polynomial algorithm for optimal contraction sequence of tensor networks from trees.

RTNI—A symbolic integrator for Haar-random tensor networks

Massive-Parallel Implementation of the Resolution-of-Identity Coupled-Cluster Approaches in the Numeric Atom-Centered Orbital Framework for Molecular Systems.

Batched Triangular Dense Linear Algebra Kernels for Very Small Matrix Sizes on GPUs

Spin Summations

QTorch: The quantum tensor contraction handler.

On extending and optimising the direct product decomposition

Equation-of-motion coupled-cluster theory based on the 4-component Dirac-Coulomb(-Gaunt) Hamiltonian. Energies for single electron detachment, attachment, and electronically excited states.

High-Performance Generalized Tensor Operations

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Tensor Contraction Research Articles

Related Topics

Articles published on Tensor Contraction

Matrix multiplication on batches of small matrices in half and half-complex precisions

DecomVQANet: Decomposing visual question answering deep network via tensor decomposition and regression

An Efficient Galerkin Averaging-Incremental Harmonic Balance Method Based on the Fast Fourier Transform and Tensor Contraction

Методы оптимизации обобщенных тензорных сверток

Hashin–Shtrikman bounds of periodic linear elastic media with cubic symmetry

Fast Bilinear Algorithms for Symmetric Tensor Contractions

Magnetic dipole localization and magnetic moment estimation method based on normalized source strength

The multi-configurational time-dependent Hartree approach in optimized second quantization: Imaginary time propagation and particle number conservation

Numerical assessment for accuracy and GPU acceleration of TD-DMRG time evolution schemes.

The influence of coordinates in robotic manipulability analysis

Supersymmetric Landau-Ginzburg tensor models

Towards a polynomial algorithm for optimal contraction sequence of tensor networks from trees.

RTNI—A symbolic integrator for Haar-random tensor networks

Massive-Parallel Implementation of the Resolution-of-Identity Coupled-Cluster Approaches in the Numeric Atom-Centered Orbital Framework for Molecular Systems.

Batched Triangular Dense Linear Algebra Kernels for Very Small Matrix Sizes on GPUs

Spin Summations

QTorch: The quantum tensor contraction handler.

On extending and optimising the direct product decomposition

Equation-of-motion coupled-cluster theory based on the 4-component Dirac-Coulomb(-Gaunt) Hamiltonian. Energies for single electron detachment, attachment, and electronically excited states.

High-Performance Generalized Tensor Operations