A binary algorithm with low divergence for modular inversion on SIMD architectures

Maurizio Laporta,Alberto Pizzirani

doi:10.1007/s11587-014-0213-9

Abstract

When performing modular arithmetic the most computationally expensive operation is the modular inversion of an integer. Its cost might be a problem for cryptanalytical applications (like the transformation from projective to affine coordinates within a Pollard rho algorithm implementation to solve the discrete logarithm problem on an elliptic curve) where performances constitute a key aspect. Good platforms for such operations are single-instruction multiple-data architectures, like graphic processing units (GPUs) because of their extremely competitive performance/price ratio. Unfortunately, when a single thread computes a single inversion, the whole computation on GPUs can be significantly slowed down in the presence of divergent threads. In this paper we describe a new algorithm to compute modular inversion on GPUs based on Stein’s Binary GCD. By exploiting the De Bruijn sequences and the Montgomery arithmetic, our version of Stein’s algorithm better fits GPUs, since it reduces the divergence among threads of the original algorithm. The paper includes a brief report on tests of our algorithm in six prime fields with characteristics of size ranging from 109 to 359 bits and in the two prime fields associated to the Mersenne primes \(2^{521} - 1\) and \(2^{607} - 1\).

Full Text