Exact Multiplication Research Articles

We study coded distributed matrix multiplication from an approximate recovery viewpoint. We consider a system of <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$P$ </tex-math></inline-formula> computation nodes where each node stores <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$1/m$ </tex-math></inline-formula> of each multiplicand via linear encoding. Our main result shows that the matrix product can be recovered with <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$\epsilon $ </tex-math></inline-formula> relative error from any <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$m$ </tex-math></inline-formula> of the <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$P$ </tex-math></inline-formula> nodes for any <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$\epsilon > 0$ </tex-math></inline-formula> . We obtain this result through a careful specialization of MatDot codes — a class of matrix multiplication codes previously developed in the context of exact recovery ( <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$\epsilon =0$ </tex-math></inline-formula> ). Since prior results showed that MatDot codes achieve the best exact recovery threshold for a class of linear coding schemes, our result shows that allowing for mild approximations leads to a system that is nearly twice as efficient as exact reconstruction. For Entangled-Poly codes — which are generalizations of MatDot codes — we show that approximation reduces the recovery threshold from <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$p^{2} q + q -1$ </tex-math></inline-formula> to <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$p^{2}q$ </tex-math></inline-formula> , when the input matrices A, B are split respectively in to a <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$p \times q$ </tex-math></inline-formula> and <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"> <tex-math notation="LaTeX">$q \times p$ </tex-math></inline-formula> grids of equal-sized submatrices.

Data hazards cause severe pipeline performance degradation for data-intensive computing processes. To improve the performance under a pessimistic assumption on the pipeline efficiency, a high-speed and energy-efficient VLSBM is proposed that successively performs a speculating and correcting phase. To reduce the critical path, the VLSBM partial products are partitioned into the (n-z)-bit least significant part (LSP) and the self-reliant (n+z)-bit most significant part (MSP), and an estimation function stochastically predicts the carry to the MSP, thereby allowing independent calculation of the partial-product accumulation of parts. When a carry prediction is accurate, the data dependence is hidden and the correcting phase is bypassed, thereby ensuring the potential speed-up of the pipelined datapath. If a prediction is inaccurate, the speculation is flushed and the correcting phase is executed to obtain the exact multiplication. The simulation results verify the effectiveness of the proposed VLSBM. When applied to a DSP algorithm with a data hazard (or dependence) probability P D , 0 ≤ P D ≤ 1, the results show that the proposed VLSBM outperforms the original Booth multiplier and the fastest conventional well-pipelined modified Booth multiplier when P D > 0.32. For the case of high P D with P D ≈ 1, the proposed VLSBM improves approximately 1.47 times speedup against the fastest conventional pipelined Booth multiplier (@UMC 90 nm CMOS) and, furthermore, approximately 25.4% of energy per multiplication and 7% of area are saved. By examining multiplications during three multimedia application processes (i.e., JPEG compression, object detection, and H.264/AVC decoding), the proposed VLSBM improves the speed-up ratio by approximately 1.0 to 1.4 times, and reduces the cycle count ratio by approximately 1.3 to 1.8 times in comparison to the fastest conventional two-stage pipelined Booth multiplier.

Exact Multiplication Research Articles

Articles published on Exact Multiplication

Efficient classical algorithms for simulating symmetric quantum systems

Digital Image Blending by Inexact Multiplication

ϵ-Approximate Coded Matrix Multiplication Is Nearly Twice as Efficient as Exact Multiplication

A low‐cost compensated approximate multiplier for Bfloat16 data processing on convolutional neural network inference

An Approximate GEMM Unit for Energy-Efficient Object Detection.

Dual-task studies of working memory and arithmetic performance: A meta-analysis.

The semantic network supports approximate computation.

The bilinear complexity and practical algorithms for matrix multiplication

Design and Implementation of High-Speed and Energy-Efficient Variable-Latency Speculating Booth Multiplier (VLSBM)

The solution and duality of imprecise network problems

Adaptive Precision Floating-Point Arithmetic and Fast Robust Geometric Predicates

Cognitive assessment of function knowledge

Fast Exact Multiplication by the Hessian

A CMOS 13-b cyclic RSD A/D converter

Cytotaxonomical and genetical studies in Urginea Steinh. Species from India.

Karyomorphological analysis of different species and varieties of Calathea, Maranta and Stromanthe of Marantaceae.

A ratio-independent algorithmic analog-to-digital conversion technique

An analytical expression for the multiplication factor M in semiconductors with equal ionization coefficients

A floating-point technique for extending the available precision

Magnetic frequency multipliers

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Exact Multiplication Research Articles

Articles published on Exact Multiplication

Efficient classical algorithms for simulating symmetric quantum systems

Digital Image Blending by Inexact Multiplication

ϵ-Approximate Coded Matrix Multiplication Is Nearly Twice as Efficient as Exact Multiplication

A low‐cost compensated approximate multiplier for Bfloat16 data processing on convolutional neural network inference

An Approximate GEMM Unit for Energy-Efficient Object Detection.

Dual-task studies of working memory and arithmetic performance: A meta-analysis.

The semantic network supports approximate computation.

The bilinear complexity and practical algorithms for matrix multiplication

Design and Implementation of High-Speed and Energy-Efficient Variable-Latency Speculating Booth Multiplier (VLSBM)

The solution and duality of imprecise network problems

Adaptive Precision Floating-Point Arithmetic and Fast Robust Geometric Predicates

Cognitive assessment of function knowledge

Fast Exact Multiplication by the Hessian

A CMOS 13-b cyclic RSD A/D converter

Cytotaxonomical and genetical studies in Urginea Steinh. Species from India.

Karyomorphological analysis of different species and varieties of Calathea, Maranta and Stromanthe of Marantaceae.

A ratio-independent algorithmic analog-to-digital conversion technique

An analytical expression for the multiplication factor M in semiconductors with equal ionization coefficients

A floating-point technique for extending the available precision

Magnetic frequency multipliers