Architectural Enhancements for Montgomery Multiplication on Embedded RISC Processors

Johann Großschädl,Guy-Armand Kamendje

doi:10.1007/978-3-540-45203-4_32

Abstract

Montgomery multiplication normally spends over 90% of its execution time in inner loops executing some kind of multiply-and-add operations. The performance of these critical code sections can be greatly improved by customizing the processor’s instruction set for low-level arithmetic functions. In this paper, we investigate the potential of architectural enhancements for multiple-precision Montgomery multiplication according to the so-called Finely Integrated Product Scanning (FIPS) method. We present instruction set extensions to accelerate the FIPS inner loop operation based on the availability of a multiply/accumulate (MAC) unit with a wide accumulator. Finally, we estimate the execution time of a 1024-bit Montgomery multiplication on an extended MIPS32 core and discuss the impact of the multiplier latency.

Full Text