High‐performance SIMD modular arithmetic for polynomial evaluation

Pierre Fortin,Ambroise Fleury,François Lemaire,Michael Monagan

doi:10.1002/cpe.6270

Abstract

SummaryTwo essential problems in computer algebra, namely polynomial factorization and polynomial greatest common divisor computation, can be efficiently solved thanks to multiple polynomial evaluations in two variables using modular arithmetic. In this article, we focus on the efficient computation of such polynomial evaluations on one single CPU core. We first show how to leverage SIMD (single instruction, multiple data) computing for modular arithmetic on AVX2 and AVX‐512 units, using both intrinsics and OpenMP compiler directives. Then we manage to increase the operational intensity and to exploit instruction‐level parallelism in order to increase the compute efficiency of these polynomial evaluations. All this results in the end to performance gains up to about 5x on AVX2 and 10x on AVX‐512.

Highlights

Computer Algebra, called symbolic computation, consists of developing algorithms and data structures for manipulating mathematical objects in an exact way
We show that the optimized AVX version implementation of van der Hoeven et al.[14] can safely be used in our polynomial evaluation, and we propose the first implementation of such modular multiplication algorithm on AVX-512, as well as the corresponding FP-based modular addition
We have first justified the choice of a modular multiplication algorithm relevant for HPC and SIMD computing

Summary

Introduction

Computer Algebra, called symbolic computation, consists of developing algorithms and data structures for manipulating mathematical objects in an exact way. Computing modulo a 64 bit prime p makes it possible to use machine integers and native CPU operations, instead of arbitrary-precision integers Since these partial modular polynomial evaluations are currently a performance bottleneck for polynomial factorizations and gcd computations, we aim in this article to speed-up their computation on modern CPUs. We focus here on one compute server since most symbolic computations are usually performed on personal workstations. We show how to significantly improve the performance of the modular polynomial evaluation by increasing the operational intensity via data reuse, and by filling the pipelines of the floating-point units. This is achieved thanks to the introduction of multiple “dependent” and “independent” evaluations and loop unrolling.

Presentation

Polynomial factorization

Polynomial gcd

The matrix method

Multi-core parallel evaluation

Selection of the modular arithmetic algorithm

SIMD programming paradigms

SIMD intrinsics and the AVX-512 version

Microbenchmarks

Integration in polynomial evaluation

Multiple dependent evaluations

Multiple independent evaluations

Loop unrolling

Performance results

Without extra memory requirements

Findings

Conclusion

Full Text

Paper version not known

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Concurrency and Computation: Practice and Experience	Publication Date: May 25, 2021
Citations: 6	License type: cc-by

R Discovery Prime

R Discovery Prime

High‐performance SIMD modular arithmetic for polynomial evaluation

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Concurrency and Computation: Practice and Experience

Lead the way for us

Similar Papers

Efficient Floating Point Arithmetic for Quantum Computers
Raphael Seidel ... Nikolay Tcholtchev
IEEE Access | VOL. 10
Raphael Seidel, et. al.Raphael Seidel ... Nikolay Tcholtchev
01 Jan 2021
IEEE Access | VOL. 10

SIMD, Single Instruction Multiple Data
Wesley Petersen ... Peter Arbenz
-
Wesley Petersen, et. al.Wesley Petersen ... Peter Arbenz
08 Jan 2004
08 Jan 2004

Coupling SIMD and SIMT architectures to boost performance of a phylogeny-aware alignment kernel
Nikolaos Alachiotis ... Simon A Berger
BMC Bioinformatics | VOL. 13
Nikolaos Alachiotis, et. al.Nikolaos Alachiotis ... Simon A Berger
09 Aug 2012
BMC Bioinformatics | VOL. 13

Parallel software applications in high-energy physics

-

01 Jan 2008
01 Jan 2008

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

High‐performance SIMD modular arithmetic for polynomial evaluation

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Concurrency and Computation: Practice and Experience